Method of compact peptide vaccines using residue optimization

ABSTRACT

A system for selecting an immunogenic peptide composition comprising a processor and a memory storing processor-executable instructions that, when executed by the processor, cause the processor to create a first peptide set by selecting a plurality of base peptides, wherein at least one peptide of the plurality of base peptides is associated with a disease, create a second peptide set by adding to the first peptide set a modified peptide, wherein the modified peptide comprises a substitution of at least one residue of a base peptide selected from the plurality of base peptides, and create a third peptide set by selecting a subset of the second peptide set, wherein the selected subset of the second peptide set has a predicted vaccine performance, wherein the predicted vaccine performance has a population coverage above a predetermined threshold, and wherein the subset comprises at least one peptide of the second peptide set.

This application is a continuation of U.S. application Ser. No.17/389,875, filed Jul. 30, 2021, which is a continuation of U.S.application Ser. No. 17/114,237, filed Dec. 7, 2020, each of which isincorporated by reference herein in their entireties.

This patent disclosure contains material that is subject to copyrightprotection. The copyright owner has no objection to the facsimilereproduction of the patent document or the patent disclosure as itappears in the U.S. Patent and Trademark Office patent file or records,but otherwise reserves any and all copyright rights.

INCORPORATION BY REFERENCE

All documents cited herein are incorporated herein by reference in theirentirety.

SEQUENCE LISTING

The instant application contains a Sequence Listing which has beensubmitted electronically in XML format and is hereby incorporated byreference in its entirety. The XML file, created on Sep. 13, 2022, isnamed 2215269_00124US1_SL.xml and is 1,337,250 bytes in size.

TECHNICAL FIELD

The present invention relates generally to compositions, systems, andmethods of peptide vaccines. More particularly, the present inventionrelates to compositions, systems, and methods of designing peptidevaccines to treat or prevent disease optimized based on predictedpopulation immunogenicity.

BACKGROUND

The goal of a peptide vaccine is to train the immune system to recognizeand expand its capacity to engage cells that display target peptides toimprove the immune response to cancerous cells or pathogens. A peptidevaccine can also be administered to someone who is already diseased toincrease their immune response to a causal cancer, other diseases, orpathogen. Alternatively, a peptide vaccine can be administered to inducethe immune system to have therapeutic tolerance to one or more peptides.There exists a need for compositions, systems, and methods of peptidevaccines based on prediction of the target peptides that will bedisplayed to protect a host from cancer, other disease, or pathogeninfection.

SUMMARY OF THE INVENTION

In one aspect, the invention provides for a system for selecting animmunogenic peptide composition comprising a processor, and a memorystoring processor-executable instructions that, when executed by theprocessor, cause the processor to create a first peptide set byselecting a plurality of base peptides, wherein at least one peptide ofthe plurality of base peptides is associated with a disease, create asecond peptide set by adding to the first peptide set a modifiedpeptide, wherein the modified peptide comprises a substitution of atleast one residue of a base peptide selected from the plurality of basepeptides, and create a third peptide set by selecting a subset of thesecond peptide set, wherein the selected subset of the second peptideset has a predicted vaccine performance, wherein the predicted vaccineperformance has a population coverage above a predetermined threshold,and wherein the subset comprises at least one peptide of the secondpeptide set.

In some embodiments, the plurality of base peptides of the first peptideset is derived from a target protein, wherein the target protein is atumor neoantigen or a pathogen proteome. In some embodiments, selectingthe plurality of base peptides to create the first peptide set comprisessliding a window of size n across an amino acid sequence encoding thetarget protein, wherein n is between about 8 amino acids and about 25amino acids in length, and wherein n is a length of each peptide of theplurality of base peptides of the first peptide set. In someembodiments, a peptide of the plurality of base peptides binds to an HLAclass I molecule or an HLA class II molecule. In some embodiments, thesubstitution of the at least one residue comprises substituting an aminoacid at an anchor residue position for a different amino acid at theanchor residue position. In some embodiments, the system furthercomprises filtering the first peptide set to exclude a peptide with apredicted binding core that contains a target residue in an anchorposition. In some embodiments, the second peptide set comprises thefirst peptide set. In some embodiments, the prediction to be bound bythe one or more HLA alleles is computed using a binding affinity of lessthan about 1000 nM. In some embodiments, the predicted vaccineperformance is determined by computing a plurality of peptide-HLAimmunogenicities of the third peptide set to at least one HLA allele. Insome embodiments, each peptide-HLA immunogenicity of the plurality ofpeptide-HLA immunogenicities of the third peptide set is based on apredicted binding affinity of less than about 500 nM. In someembodiments, the predicted vaccine performance is based on a populationcoverage, wherein the population coverage is computed based on afrequency of an HLA haplotype in a human population. In someembodiments, the predicted vaccine performance is based on a populationcoverage, wherein the population coverage is computed based on afrequency of at least two HLA alleles in a human population. In someembodiments, the plurality of base peptides is present in a singlesubject. In some embodiments, the predicted vaccine performance is anexpected number of peptide-HLA hits. In some embodiments, the disease iscancer, and wherein the cancer is selected from the group consisting ofpancreas, colon, rectum, kidney, bronchus, lung, uterus, cervix,bladder, liver, and stomach. In some embodiments, the plurality of basepeptides of the first peptide set comprises at least one self-peptide.

In another aspect, the invention provides for a non-transitorycomputer-readable storage medium comprising computer-readableinstructions for determining an immunogenic peptide composition that,when executed by a processor cause the processor to create a firstpeptide set by selecting a plurality of base peptides, wherein a firstbase peptide and a second base peptide of the plurality of base peptidesare each scored for binding by two or more HLA alleles, wherein thefirst base peptide and the second base peptide are each predicted to bebound by one or more HLA alleles, and wherein the first base peptide andthe second base peptide are associated with a disease, create a secondpeptide set comprising the first base peptide, the second base peptide,a first modified peptide, and a second modified peptide, wherein thefirst modified peptide comprises a substitution of at least one residueof the first base peptide, and wherein the second modified peptidecomprises a substitution of at least one residue of the second basepeptide, and create a third peptide set by selecting a subset of thesecond peptide set, wherein the selected subset of the second peptideset has a predicted vaccine performance, and wherein the predictedvaccine performance is a function of a peptide-HLA immunogenicity of atleast one peptide of the third peptide set with respect to the two ormore HLA alleles.

In some embodiments, the plurality of base peptides of the first peptideset is derived from a target protein, wherein the target protein is atumor neoantigen or a pathogen proteome. In some embodiments, selectingthe plurality of base peptides to create the first peptide set comprisessliding a window of size n across an amino acid sequence encoding thetarget protein, wherein n is between about 8 amino acids and about 25amino acids in length, and wherein n is a length of each peptide of theplurality of base peptides of the first peptide set. In someembodiments, a peptide of the plurality of base peptides binds to an HLAclass I molecule or an HLA class II molecule. In some embodiments, thesubstitution of the at least one residue comprises substituting an aminoacid at an anchor residue position for a different amino acid at theanchor residue position. In some embodiments, the non-transitorycomputer-readable storage medium of further comprises filtering thefirst peptide set to exclude a peptide with a predicted binding corethat contains a target residue in an anchor position. In someembodiments, the second peptide set comprises the first peptide set. Insome embodiments, the prediction to be bound by the two or more HLAalleles is computed using a binding affinity of less than about 1000 nM.In some embodiments, the plurality of base peptides of the first peptideset comprises at least one self-peptide.

In another aspect, the invention provides for a system for selecting animmunogenic peptide composition comprising a processor, and a memorystoring processor-executable instructions that, when executed by theprocessor, cause the processor to create a first peptide set byselecting a plurality of base peptides, wherein a first base peptide ofthe plurality of base peptides is scored for binding by three or moreHLA alleles, wherein the first base peptide is predicted to be bound byone or more HLA alleles, and wherein the first base peptide isassociated with a disease, create a second peptide set comprising thefirst base peptide and a modified peptide, wherein the modified peptidecomprises a substitution of at least one residue of the first basepeptide, and create a third peptide set by selecting a subset of thesecond peptide set, wherein the selected subset of the second peptideset has a predicted vaccine performance, and wherein the predictedvaccine performance is a function of a peptide-HLA immunogenicity of atleast one peptide of the third peptide set with respect to the three ormore HLA alleles.

In some embodiments, the first base peptide is scored for binding basedon data obtained from experimental assays. In some embodiments, thepredicted vaccine performance includes a peptide-HLA immunogenicity ofthe modified peptide bound to the first HLA allele of the one or moreHLA alleles if the first base peptide is predicted to be bound to thefirst HLA allele of the one or more HLA alleles with a first bindingcore, wherein the first binding core is a binding core of the first basepeptide, wherein the first binding core is identical to a second bindingcore, and wherein the second binding core is a binding core of themodified peptide bound to the first HLA allele.

In another aspect, the invention provides for a non-transitorycomputer-readable storage medium comprising computer-readableinstructions for determining an immunogenic peptide composition that,when executed by a processor cause the processor to create a firstpeptide set by selecting a plurality of base peptides, wherein at leastone peptide of the plurality of base peptides is associated with adisease, create a second peptide set comprising a first base peptideselected from the first base peptide set and a modified peptide, whereinthe modified peptide comprises a substitution of at least one residue ofthe first base peptide, and create a third peptide set by selecting asubset of the second peptide set, wherein the selected subset of thesecond peptide set has a predicted vaccine performance, wherein thepredicted vaccine performance has an expected number of peptide-HLA hitsabove a predetermined threshold, and wherein the subset comprises atleast one peptide of the second peptide set.

In some embodiments, the first base peptide binds to an HLA class Imolecule or an HLA class II molecule.

In another aspect, the invention provides for a system for selecting animmunogenic peptide composition comprising a processor, and a memorystoring processor-executable instructions that, when executed by theprocessor, cause the processor to create a first peptide set byselecting a first plurality of peptides, wherein the first plurality ofpeptides comprises a plurality of target peptides that are associatedwith a first disease, and wherein the first peptide set has a firstpredicted vaccine performance value, create a second peptide set byselecting a second plurality of peptides, wherein the second pluralityof peptides comprises a plurality of target peptides that are associatedwith a second disease, and wherein the second peptide set has a secondpredicted vaccine performance value, create a first weighted peptide setby multiplying a first weight by the first predicted vaccine performancevalue, create a second weighted peptide set multiplying a second weightby the second predicted vaccine performance value, and create a thirdpeptide set by combining the first weighted peptide set and the secondweighted peptide set.

In some embodiments, the first predicted vaccine performance value andthe second predicted vaccine performance value are computed based on apopulation coverage of a vaccine. In some embodiments, the firstpredicted vaccine performance value and the second predicted vaccineperformance value are computed based on an expected number ofpeptide-HLA hits. In some embodiments, the first plurality of peptidesis derived from a tumor neoantigen or a pathogen proteome. In someembodiments, the second plurality of peptides is derived from a tumorneoantigen or a pathogen proteome. In some embodiments, the firstdisease is cancer, and wherein the cancer is selected from the groupconsisting of pancreas, colon, rectum, kidney, bronchus, lung, uterus,cervix, bladder, liver, and stomach. In some embodiments, the seconddisease is cancer, and wherein the cancer is selected from the groupconsisting of pancreas, colon, rectum, kidney, bronchus, lung, uterus,cervix, bladder, liver, and stomach. In some embodiments, the firstplurality of peptides comprises a peptide that binds to an HLA class Imolecule or an HLA class II molecule. In some embodiments, the secondplurality of peptides comprises a peptide that binds to an HLA class Imolecule or an HLA class II molecule.

BRIEF DESCRIPTION OF THE DRAWINGS

The following figures depict illustrative embodiments of the invention.

FIG. 1 is a flow chart of a vaccine optimization method.

FIG. 2 is a flow chart of vaccine optimization method with seed setcompression.

FIG. 3 shows predicted population coverage for single target MHC class Ivaccines by vaccine size for KRAS G12D, KRAS G12V, KRAS G12R, KRAS G12C,and KRAS G13D targets.

FIG. 4 shows predicted population coverage for single target MHC classII vaccines by vaccine size for KRAS G12D, KRAS G12V, KRAS G12R, KRASG12C, and KRAS G13D targets.

FIG. 5 shows probabilities of disease presentations for pancreas,colon/rectum, and bronchus/lung and respective probabilities of targetpresentations for KRAS G12D, KRAS G12V, and KRAS G12R targets.

FIG. 6 is a flow chart for multiple target (combined) vaccineoptimization methods.

FIG. 7 shows predicted population coverage for pancreatic cancermultiple target (combined) MHC class I vaccines by vaccine size for KRASG12D, KRAS G12V, and KRAS G12R targets.

FIG. 8 shows predicted population coverage for pancreatic cancermultiple target (combined) MHC class II vaccines by vaccine size forKRAS G12D, KRAS G12V, and KRAS G12R targets.

FIG. 9 shows an example Python implementation of the MERGEMULTI functionfor combined vaccine design procedures.

FIG. 10 shows predicated peptide-HLA hits by vaccine size for a KRASG12V vaccine for the HLA diplotype HLA-A02:03, HLA-A11:01, HLA-B55:02,HLA-B58:01, HLA-C03:02, HLA-C03:03.

DETAILED DESCRIPTION

In some embodiments, the disclosure provides for peptide vaccines thatincorporate peptide sequences that will be displayed by MajorHistocompatibility Complex (MHC) molecules on cells and train the immunesystem to recognize cancer or pathogen diseased cells. In someembodiments, the disclosure provides for peptide vaccines that thatincorporate peptide sequences that will be displayed by MajorHistocompatibility Complex (MHC) molecules on cells to inducetherapeutic tolerance in antigen-specific immunotherapy for autoimmunediseases (Alhadj Ali et al., 2017, Gibson, et al. 2015). In someembodiments, a peptide vaccine is a composition that consists of one ormore peptides. In some embodiments, a peptide vaccine is an mRNA or DNAconstruct administered for expression in vivo that encodes for one ormore peptides.

Peptide display by an MHC molecule is necessary, but not sufficient, fora peptide to be immunogenic and cause the recognition of the resultingpeptide-MHC complex by an individual's T cells to trigger T cellactivation, expansion, and immune memory. In some embodiments,experimental data from assays such as the ELISPOT (Slota et al., 2011)or the Multiplex Identification of Antigen-Specific T Cell ReceptorsUsing a Combination of Immune Assays and Immune Receptor Sequencing(MIRA) assay (Klinger et al., 2015) is used for scoring peptide display(e.g., binding affinity) by an MHC molecule (e.g., HLA allele). In someembodiments, experimental data from assays such as the ELISPOT (Slota etal., 2011) or the Multiplex Identification of Antigen-Specific T CellReceptors Using a Combination of Immune Assays and Immune ReceptorSequencing (MIRA) assay (Klinger et al., 2015) can be combined withmachine learning based predictions for scoring peptide display (e.g.,binding affinity) by an MHC molecule (e.g., HLA allele). In someembodiments, the MHCflurry or NetMHCpan (Reynisson et al., 2020)computational methods (as known in the art) are used to predict MHCclass I display of a peptide by an HLA allele (see Table 1). In someembodiments, the NetMHCIIpan computational method (Reynisson et al.,2020) is used to predict MHC class II display of a peptide by an HLAallele (see Table 2).

A peptide is displayed by an MHC molecule when it binds within thegroove of the MHC molecule and is transported to the cell surface whereit can be recognized by a T cell receptor. A target peptide refers to aforeign peptide or a self-peptide. In some embodiments, a peptide thatis part of the normal proteome in a healthy individual is aself-peptide, and a peptide that is not part of the normal proteome is aforeign peptide. Foreign peptides can be generated by mutations innormal self-proteins in tumor cells that create epitopes calledneoantigens, or by pathogenic infections. In some embodiments, aneoantigen is any subsequence of a human protein, where the subsequencecontains one or more altered amino acids or protein modifications thatdo not appear in a healthy individual. Therefore, in this disclosure,foreign peptide refers to an amino acid sequence encoding a fragment ofa target protein/peptide (or a full-length protein/peptide), the targetprotein/peptide consisting of: a neoantigen protein, a pathogenproteome, or any other undesired protein that is non-self and isexpected to be bound and displayed by an HLA allele.

For example, KRAS gene mutations are the most frequently mutatedoncogenes in cancer, but they have been very difficult to treat withsmall molecule therapeutics. The KRAS protein is part of a signalingpathway that controls cellular growth, and point mutations in theprotein can cause constitutive pathway activation and uncontrolled cellgrowth. Single amino acid KRAS mutations result in minor changes inprotein structure, making it difficult to engineer small molecule drugsthat recognize a mutant specific binding pocket and inactivate KRASsignaling. KRAS oncogenic mutations include the mutation of position 12from glycine to aspartic acid (G12D), glycine to valine (G12V), glycineto arginine (G12R), or glycine to cystine (G12C); or the mutation ofposition 13 from glycine to aspartic acid (G13D). The correspondingforeign peptides contain these mutations.

A challenge for the design of peptide vaccines is the diversity of humanMHC alleles (HLA alleles) that each have specific preferences for thepeptide sequences they will display. The Human Leukocyte Antigen (HLA)loci, located within the MHC, encode the HLA class I and class IImolecules. There are three classical class I loci (HLA-A, HLA-B, andHLA-C) and three loci that encode class II molecules (HLA-DR, HLA-DQ,and HLA-DP). An individual's HLA type describes the alleles they carryat each of these loci. Peptides of length of between about 8 and about11 residues can bind to HLA class I (or MHC class I) molecules whereasthose of length of between about 13 and about 25 bind to HLA class II(or MHC class II) molecules (Rist et al., 2013; Chicz et al., 1992).Human populations that originate from different geographies havediffering frequencies of HLA alleles, and these populations exhibitlinkage disequilibrium between HLA loci that result in populationspecific haplotype frequencies. In some embodiments, methods aredisclosed for creating effective vaccines that includes consideration ofthe HLA allelic frequency in the target population, as well as linkagedisequilibrium between HLA genes to achieve a set of peptides that islikely to be robustly displayed.

The present disclosure provides for compositions, systems, and methodsof vaccine designs that produce immunity to single or multiple targets.In some embodiments, a target is a neoantigen protein sequence, apathogen proteome, or any other undesired protein sequence that isnon-self and is expected to be bound and displayed by an HLA molecule(also referred to herein as an HLA allele). When a target is present inan individual, it may result in multiple peptide sequences that aredisplayed by a variety of HLA alleles. In some embodiments, it may bedesirable to create a vaccine that includes selected self-peptides, andthus these selected self-peptides are considered to be the targetpeptides for this purpose.

The term peptide-HLA binding is defined to be the binding of a peptideto an HLA allele, and can either be computationally predicted,experimentally observed, or computationally predicted using experimentalobservations. The metric of peptide-HLA binding can be expressed asaffinity, percentile rank, binary at a predetermined threshold,probability, or other metrics as are known in the art. The termpeptide-HLA immunogenicity is defined as the activation of T cells basedupon their recognition of a peptide when bound by an HLA allele.Peptide-HLA immunogenicity can vary from individual to individual, andthe metric for peptide-HLA immunogenicity can be expressed as aprobability, a binary indicator, or other metric that relates to thelikelihood that a peptide-HLA combination will be immunogenic. In someembodiments, peptide-HLA immunogenicity is defined as the induction ofimmune tolerance based upon the recognition of a peptide when bound byan HLA allele. Peptide-HLA immunogenicity can be computationallypredicted, experimentally observed, or computationally predicted usingexperimental observations. In some embodiments, peptide-HLAimmunogenicity is based only upon peptide-HLA binding, since peptide-HLAbinding is necessary for peptide-HLA immunogenicity. In someembodiments, peptide-HLA immunogenicity data or computationalpredictions of peptide-HLA immunogenicity can be included and combinedwith scores for peptide display in the methods disclosed herein. One wayof combining the scores is using immunogenicity data for peptidesassayed for immunogenicity in diseased or vaccinated individuals, andassigning peptides to the HLA allele that displayed them in theindividual by choosing the HLA allele that computational methods predicthas the highest likelihood of display. For peptides that are notexperimentally assayed, computational predictions of display can beused. In some embodiments, different computational methods of predictingpeptide-HLA immunogenicity or peptide-HLA binding can be combined (Liuet al., 2020b). For a given set of peptides and a set of HLA alleles,the term peptide-HLA hits is the number of unique combinations ofpeptides and HLA alleles that exhibit peptide-HLA immunogenicity orbinding at a predetermined threshold. For example, a peptide-HLA hit of2 can mean that one peptide is predicted to be bound (or trigger T cellactivation) by two different HLA alleles, two peptides are predicted tobe bound (or trigger T cell activation) by two different HLA alleles, ortwo peptides are predicted to be bound (or trigger T cell activation) bythe same HLA allele. For a given set of peptides and HLA frequencies,HLA haplotype frequencies, or HLA diplotype frequencies, the expectednumber of peptide-HLA hits is the average number of peptide-HLA hits ineach set of HLAs that represent an individual, weighted by theirfrequency of occurrence.

Since immunogenicity may vary from individual to individual, one methodto increase the probability of vaccine efficacy is to use a diverse setof target peptides (e.g., at least two peptides) to increase the chancesthat some subset of them will be immunogenic in a given individual.Prior research using mouse models has shown that most MHC displayedpeptides are immunogenic, but immunogenicity varies from individual toindividual as described in Croft et al. (2019). In some embodiments,experimental peptide-HLA immunogenicity data are used to determine whichtarget peptides and their modifications will be effective immunogens ina vaccine.

Considerations for the design of peptide vaccines are outlined in Liu etal., Cell Systems 11, Issue 2, p. 131-146 (Liu et al., 2020) and (Liu etal., 2020b) which are incorporated by reference in their entiretiesherein.

Certain target peptides may not bind with high affinity to a wide rangeof HLA molecules. To increase the binding of target peptides to HLAmolecules, their amino acid composition can be altered to change one ormore anchor residues or other residues. Anchor residues are amino acidsthat interact with an HLA molecule and have the largest influence on theaffinity of a peptide for an HLA molecule. Peptides with altered anchorresidues are called heteroclitic peptides. In some embodiments,heteroclitic peptides include target peptides with residue modificationsat non-anchor positions. In some embodiments, heteroclitic peptidesinclude target peptides with residue modifications that includeunnatural amino acids and amino acid derivatives. Modifications tocreate heteroclitic peptides can improve the binding of peptides to bothMHC class I and MHC class II molecules, and the modifications requiredcan be both peptide and MHC class specific. Since peptide anchorresidues face the MHC molecule groove, they are less visible than otherpeptide residues to T cell receptors. Thus, heteroclitic peptides havebeen observed to induce a T cell response where the stimulated T cellsalso respond to unmodified peptides. It has been observed that the useof heteroclitic peptides in a vaccine can improve a vaccine'seffectiveness (Zirlik et al., 2006). In some embodiments, theimmunogenicity of heteroclitic peptides are experimentally determinedand their ability to activate T cells that also recognize thecorresponding base (also called seed) peptide of the heterocliticpeptide is determined, as is known in the art. In some embodiments,these assays of the immunogenicity and cross-reactivity of heterocliticpeptides are performed when the heteroclitic peptides are displayed byspecific HLA alleles.

Peptide Vaccines to Induce Immunity to One or More Targets

In some embodiments, a method is provided for formulating peptidevaccines using a single vaccine design for one or more targets. In someembodiments, a single target is a foreign protein with a specificmutation (e.g., KRAS G12D). In some embodiments, a single target is aself-protein (e.g., a protein that is overexpressed in tumor cells suchas cancer/testis antigens). In some embodiments, multiple targets can beused (e.g. both KRAS G12D and KRAS G13D).

In some embodiments, the method includes extracting peptides toconstruct a candidate set from all target proteome sequences (e.g.,entire KRAS G12D protein) as described in Liu et al. (2020).

FIGS. 1 and 2 depict flow charts for example vaccine design methods thatcan be used for MHC class I or MHC class II vaccine design. In someembodiments, extracted target peptides are of amino acid length ofbetween about 8 and about 10 (e.g., for MHC class I binding (Rist etal., 2013)). In some embodiments, the extracted target peptidespresented by MHC class I molecules are longer than 10 amino acidresidues, such as 11 residues (Trolle et al., 2016). In someembodiments, extracted target peptides are of length between about 13and about 25 (e.g., for class II binding (Chicz et al., 1992)). In someembodiments, sliding windows of various size ranges described herein areused over the entire proteome. In some embodiments, other target peptidelengths for MHC class I and class II sliding windows can be utilized. Insome embodiments, computational predictions of proteasomal cleavage areused to filter or select peptides in the candidate set. Onecomputational method for predicting proteasomal cleavage is described byNielsen et al. (2005). In some embodiments, peptide mutation rates,glycosylation, cleavage sites, or other criteria can be used to filterpeptides as described in Liu et al. (2020). In some embodiments,peptides can be filtered based upon evolutionary sequence variationabove a predetermined threshold. Evolutionary sequence variation can becomputed with respect to other species, other pathogens, other pathogenstrains, or other related organisms. In some embodiments, a firstpeptide set is the candidate set.

As shown in FIGS. 1-2 , in some embodiments, the next step of the methodincludes scoring the target peptides in the candidate set forpeptide-HLA binding to all considered HLA alleles as described in Liu etal. (2020) and Liu et al. (2020b). In some embodiments, a first peptideset is the candidate set after scoring the target peptides. Scoring canbe accomplished for human HLA molecules, mouse H-2 molecules, swine SLAmolecules, or MHC molecules of any species for which predictionalgorithms are available or can be developed. Thus, vaccines targeted atnon-human species can be designed with the method. Scoring metrics caninclude the affinity for a target peptide to an HLA allele in nanomolar,eluted ligand, presentation, and other scores that can be expressed aspercentile rank or any other metric. The candidate set may be furtherfiltered to exclude peptides whose predicted binding cores do notcontain a particular pathogenic or neoantigen target residue of interestor whose predicted binding cores contain the target residue in an anchorposition. The candidate set may also be filtered for target peptides ofspecific lengths, such as length 9 for MHC class I, for example. In someembodiments, scoring of target peptides is accomplished withexperimental data or a combination of experimental data andcomputational prediction methods. When computational models areunavailable to make peptide-HLA binding predictions for particular(peptide, HLA) pairs, the binding value for such pairs can be defined bythe mean, median, minimum, or maximum immunogenicity value taken oversupported pairs, a fixed value (such as zero), or inferred using othertechniques, including a function of the prediction of the most similar(peptide, HLA) pair available in the scoring model.

In some embodiments, a base set (also referred to as seed set herein) isconstructed by selecting peptides from the scored candidate set usingindividual peptide-HLA binding or immunogenicity criteria (e.g., firstpeptide set) (FIG. 1 ). The criteria used for scoring peptide-HLAbinding during the scoring procedure can accommodate different goalsduring the base set selection and vaccine design phases. For example, atarget peptide with peptide-HLA binding affinities of 500 nM may bedisplayed by an individual that is diseased, but at a lower frequencythan a target peptide with a 50 nM peptide-HLA binding affinity. In someembodiments, during the scoring of a candidate set to qualify peptidesfor membership in the base set as potential immune system targets, 1000nM or other less constrained affinity criteria than 50 nM may beutilized. During the combinatorial design phase of a vaccine, a moreconstrained affinity criteria may be used (e.g., when selecting a thirdpeptide set), such a 50 nM, to increase the probability that a vaccinepeptide will be found and displayed by HLA molecules. In someembodiments, peptides are scored for third peptide set potentialinclusion that have peptide-HLA binding affinities less than about 500nM. In some embodiments, peptides are selected for the base set thathave peptide-HLA binding affinities less than about 1000 nM.Alternatively, predictions of peptide-HLA immunogenicity can be used toqualify target peptides for base set inclusion. In some embodiments,experimental observations of the immunogenicity of peptides in thecontext of their display by HLA alleles or experimental observation ofthe binding of peptides to HLA alleles can be used to score peptides forbinding to HLA alleles or peptide-HLA immunogenicity. In someembodiments, computational predictions of the immunogenicity of apeptide in the context of display by HLA alleles can used for scoringsuch as the methods of Ogishi et al. (2019).

In some embodiments, the method further includes running theOptiVax-Robust algorithm as described in Liu et al. (2020) using the HLAhaplotype frequencies of a population on the scored candidate set toconstruct a base set (also referred to as seed set herein) of targetpeptides (FIG. 2 ). In some embodiments, HLA diplotype frequencies canbe provided to OptiVax. OptiVax-Robust includes algorithms to eliminatepeptide redundancy that arises from the sliding window approach withvarying window sizes, but other redundancy elimination measures can beused to enforce minimum edit distance constraints between targetpeptides in the candidate set. The size of the seed set is determined bya point of diminishing returns of population coverage as a function ofthe number of target peptides in the seed set. Other criteria can alsobe used, including a minimum number of vaccine target peptides, maximumnumber of vaccine target peptides, and desired predicted populationcoverage. In some embodiments, a predetermined population coverage isless than about 0.4, between about 0.4 and 0.5, between about 0.5 and0.6, between about 0.6 and 0.7, between about 0.7 and 0.8, between about0.8 and 0.9, or greater than about 0.9. Another possible criterion is aminimum number of expected peptide-HLA binding hits in each individual.In alternate embodiments, the method further includes running theOptiVax-Unlinked algorithm as described in Liu et al. (2020) instead ofOptiVax-Robust.

The OptiVax-Robust method uses binary predictions of peptide-HLAimmunogenicity, and these binary predictions can be generated asdescribed in Liu et al. (2020b). The OptiVax-Unlinked method uses theprobability of target peptide binding to HLA alleles and can begenerated as described in Liu et al. (2020). In some embodiments,OptiVax-Unlinked and EvalVax-Unlinked are used with the probabilities ofpeptide-HLA immunogenicity. Either method can be used for the purposesdescribed herein, and thus the term “OptiVax” refers to either theRobust or Unlinked method. In some embodiments, the HLA haplotype or HLAallele frequencies of a population provided to OptiVax for vaccinedesign describe the world's population. In alternative embodiments, theHLA haplotype or HLA allele frequencies of a population provided toOptiVax for vaccine design are specific to a geographic region. Inalternative embodiments, the HLA haplotype or HLA allele frequencies ofa population provided to OptiVax for vaccine design are specific to anancestry. In alternative embodiments, the HLA haplotype or HLA allelefrequencies of a population provided to OptiVax for vaccine design arespecific to a race. In alternative embodiments, the HLA haplotype or HLAallele frequencies of a population provided to OptiVax for vaccinedesign are specific to individuals with risk factors such as geneticindicators of risk, age, exposure to chemicals, alcohol use, chronicinflammation, diet, hormones, immunosuppression, infectious agents,obesity, radiation, sunlight, or tobacco use. In alternativeembodiments, the HLA haplotype or HLA allele frequencies of a populationprovided to OptiVax for vaccine design are specific to individuals thatcarry certain HLA alleles. In alternative embodiments, the HLAdiplotypes provided to OptiVax for vaccine design describe a singleindividual, and are used to design an individualized vaccine.

In some embodiments, the base (or seed) set of target peptides (e.g.,first peptide set) that results from OptiVax application to thecandidate set of target peptides describes a set of unmodified targetpeptides that represent a possible compact vaccine design (Seed Set inFIG. 2 ). In some embodiments, the seed set (e.g., first peptide set) isbased upon filtering candidate peptides by predicted or observedaffinity or immunogenicity with respect to HLA molecules (Seed Set inFIG. 1 ). However, to improve the display of the target peptides in awide range of HLA haplotypes as possible, some embodiments includemodifications of the seed (or base) set. In some embodiments,experimental assays can be used to ensure that a modified seed (or base)peptide activates T cells that also recognize the base/seed peptide.

For a given target peptide, the optimal anchor residue selection maydepend upon the HLA allele that is binding to and displaying the targetpeptide and the class of the HLA allele (MHC class I or class II). Aseed peptide set (e.g., first peptide set) can become an expanded set byincluding anchor residue modified peptides of either MHC class I or IIpeptides (FIGS. 1-2 ). Thus, one aspect of vaccine design is consideringhow to select a limited set of heteroclitic peptides that derive fromthe same target peptide for vaccine inclusion given that differentheteroclitic peptides will have different and potentially overlappingpopulation coverages.

In some embodiments, all possible anchor modifications for each base setof target peptide are considered. There are typically two anchorresidues in peptides bound by MHC class I molecules, typically atpositions 2 and 9 for 9-mer peptides. At each anchor position, 20possible amino acids are attempted in order to select the bestheteroclitic peptides. Thus, for MHC class I binding, 400 (i.e., 20amino acids by 2 positions=20²) minus 1 heteroclitic peptides aregenerated for each base target peptide. There are typically four anchorresidues in peptides bound by MHC class II molecules, typically atpositions 1, 4, 6, and 9 of the 9-mer binding core. Thus, for MHC classII binding there are 160,000 (i.e., 20 amino acids by 4 positions=20⁴)minus 1 heteroclitic peptides generated for each base target peptide.Other methods, including Bayesian optimization, can be used to selectoptimal anchor residues to create heteroclitic peptides from each seed(or base) set peptide. Other methods are presented in “Machine learningoptimization of peptides for presentation by class II MHCs” by Dai etal. (2020), incorporated in its entirety herein. In some embodiments,the anchor positions are determined by the HLA allele that presents apeptide, and thus the set of heteroclitic peptides includes for each setof HLA specific anchor positions, all possible anchor modifications.

In some embodiments, for all of the target peptides in the base/seedset, new peptide sequences with all possible anchor residuemodifications (e.g., MHC class I or class II) are created resulting in anew heteroclitic base set (Expanded set in FIGS. 1-2 ) that includes allof the modifications. In some embodiments, for all of the targetpeptides in the base/seed set, new peptide sequences with anchor residuemodifications (e.g., MHC class I or class II) at selected anchorlocations are created resulting in a new heteroclitic base set (Expandedset in FIGS. 1-2 ) that includes the selected modifications. In someembodiments, the anchor residue positions used for modifying peptidesare selected from anchor residue positions determined by the HLA allelesconsidered during vaccine evaluation. In some embodiments, theheteroclitic base set (Expanded set in FIGS. 1-2 ) also includes theoriginal seed (or base) set (Seed Peptide Set in FIGS. 1-2 ). In someembodiments, the heteroclitic base set includes amino acid substitutionsat non-anchor residues. In some embodiments, modifications of basepeptide residues is accomplished to alter binding to T cell receptors toimprove therapeutic efficacy (Candia, et al. 2016). In some embodiments,the heteroclitic base set includes amino acid substitutions ofnon-natural amino acid analogs. The heteroclitic base set is scored forHLA affinity, peptide-HLA immunogenicity, or other metrics as describedherein (another round of Peptide Filtering and Scoring as shown in FIGS.1-2 ). The scoring predictions may be further updated for pairs ofheteroclitic peptide and HLA allele, eliminating pairs where aheteroclitic peptide is predicted to be displayed by an allele but theseed (or base) peptide from which it was derived is not predicted to bedisplayed by the allele. The scoring predictions may also be filtered toensure that predicted binding cores of the heteroclitic peptidedisplayed by a particular HLA allele align exactly in position with thebinding cores of the respective seed (or base) set target peptide forthat HLA allele. In some embodiments, the scoring predictions arefiltered for an HLA allele to ensure that the heteroclitic peptidesconsidered for that HLA allele are only modified at anchor positionsdetermined by that HLA allele. Scoring produces a metric of peptide-HLAimmunogenicity for peptides and HLA alleles that can be either binary, aprobability of immunogenicity, or other metric of immunogenicity such aspeptide-HLA affinity or percent rank, and can be based on computationalpredictions, experimental observations, or a combination of bothcomputational predictions and experimental observations. In someembodiments, probabilities of peptide-HLA immunogenicity are utilized byOptiVax-Unlinked. In some embodiments, heteroclitic peptides areincluded in experimental assays such as MIRA (Klinger et al., 2015) todetermine their immunogenicity with respect to specific HLA alleles. Insome embodiments, the methods of Liu et al. (2020b), can be used toincorporate MIRA data for heteroclitic peptides into a model ofpeptide-HLA immunogenicity. In some embodiments, the immunogenicity ofheteroclitic peptides are experimentally determined and their ability toactivate T cells that also recognize the corresponding seed (or base)peptide of the heteroclitic peptide is performed as is known in the artto qualify the heteroclitic peptide for vaccine inclusion. In someembodiments, these assays of the immunogenicity and cross-reactivity ofheteroclitic peptides are performed when the heteroclitic peptides aredisplayed by specific HLA alleles. In some embodiments, computationalpredictions of the immunogenicity of a heteroclitic peptide in thecontext of display by HLA alleles can used for scoring such as themethods of Ogishi et al. (2019).

In some embodiments, the next step involves inputting the heterocliticbase set (also referred to as Expanded set as shown in FIGS. 1-2 ) toOptiVax to select a compact set of vaccine peptides that maximizespredicted vaccine performance (Vaccine Performance Optimization; FIGS.1-2 ). In some embodiments, predicted vaccine performance is a functionof expected peptide-HLA binding affinity (e.g., a function of thedistribution of peptide-HLA binding affinities across all peptide-HLAcombinations for a given peptide set, or weighted by the occurrence ofthe HLA alleles in a population or individual). In some embodiments,predicted vaccine performance is the expected population coverage of avaccine. In some embodiments, predicted vaccine performance is theexpected number peptide-HLA hits produced by a vaccine in a populationor individual. In some embodiments, predicted vaccine performancerequires a minimum expected number of peptide-HLA hits (e.g., 1, 2, 3,4, 5, 6, 7, 8, or more) produced by a vaccine. In some embodiments,predicted vaccine performance is a function of population coverage andexpected number of peptide-HLA hits desired produced by a vaccine. Insome embodiments, predicted vaccine performance is a metric thatdescribes the overall immunogenic properties of a vaccine where all ofthe peptides in the vaccine are scored for peptide-HLA immunogenicityfor two or more HLA alleles (e.g., three or more HLA alleles). In someembodiments, predicted vaccine performance excludes immunogenicitycontributions by selected HLA alleles above a maximum number ofpeptide-HLA hits (e.g., 1, 2, 3, 4, 5, 6, 7, 8, or more). In someembodiments, predicted vaccine performance excludes immunogenicitycontributions of individual HLA diplotypes above a maximum number ofpeptide-HLA hits (e.g., 1, 2, 3, 4, 5, 6, 7, 8, or more). In someembodiments, predicted vaccine performance is the fraction of coveredHLA alleles, which is the expected fraction of HLA alleles in eachindividual that have a minimum number of peptides (e.g., 1, 2, 3, 4, 5,6, 7, 8, or more) with predicted peptide-HLA immunogenicity produced bya vaccine. In some embodiments, predicted vaccine performance is theexpected fraction of HLA alleles in a single individual that have aminimum number of peptides (e.g., 1, 2, 3, 4, 5, 6, 7, 8, or more) withpredicted peptide-HLA immunogenicity produced by a vaccine.

Predicted vaccine performance refers to a metric. Predicted vaccineperformance can be expressed as a single numerical value, a plurality ofnumerical values, any number of non-numerical values, and a combinationthereof. The value or values can be expressed in any mathematical orsymbolic term and on any scale (e.g., nominal scale, ordinal scale,interval scale, or ratio scale).

A seed (or base) peptide and all of the modified peptides that arederived from that seed (or base) peptide comprise a single peptidefamily. In some embodiments, in the component of vaccine performancethat is based on peptide-HLA immunogenicity for a given HLA allele, amaximum number of peptides (e.g., 1, 2, 3, 4, 5, 6, 7, 8, or more) thatare in the same peptide family are given computational immunogenicitycredit for that HLA allele. This limit on peptide family immunogenicitylimits the credit caused by many modified versions of the same basepeptide. In some embodiments, the methods described herein are includedfor running OptiVax with an EvalVax objective function that correspondsto a desired metric of predicted vaccine performance. In someembodiments, population coverage means the proportion of a subjectpopulation that presents one or more immunogenic peptides that activateT cells responsive to a seed (or base) target peptide. The metric ofpopulation coverage is computed using the HLA haplotype frequency in agiven population such as a representative human population. In someembodiments, the metric of population coverage is computed usingmarginal HLA frequencies in a population. Maximizing population coveragemeans selecting a peptide set (either a base peptide set, a modifiedpeptide set, or a combination of base and modified peptides; e.g., afirst peptide set, second peptide set, or third peptide set) thatcollectively results in the greatest fraction of the population that hasat least a minimum number (e.g., 1, 2, 3, 4, 5, 6, 7, 8, or more) ofimmunogenic peptide-HLA bindings based on proportions of HLA haplotypesin a given population (e.g., representative human population). In someembodiments, this process includes the OptiVax selection of heterocliticpeptides (as described in this disclosure) that activate T cells thatrespond to their corresponding seed (or base) peptide and theheteroclitic base peptides to improve population coverage. In someembodiments, the seed (or base) target peptides are always included inthe final vaccine design. In some embodiments, peptides are onlyconsidered as candidates for a vaccine design (e.g., included in afirst, second, and/or third peptide set) if they have been observed tobe immunogenic in clinical data, animal models, or tissue culturemodels.

Although heteroclitic peptides are used as exemplary embodiments in thisdisclosure, any modified peptide could be used in place of aheteroclitic peptide. A modified peptide is a peptide that has one ormore amino acid substitutions of a target base/seed peptide. The aminoacid substitution could be located at an anchor position or any othernon-anchor position.

In some embodiments, a candidate vaccine peptide (e.g., a base peptideor a modified peptide) is eliminated from vaccine inclusion if itactivates T cells that recognize self-peptides (e.g., this can beachieved at the first and/or second round of Peptide Filtering andSorting as shown in FIGS. 1-2 ). In some embodiments, a candidatevaccine peptide (e.g., a base peptide or a modified peptide) iscomputationally eliminated from vaccine inclusion if its outward facingamino acids when bound by an HLA allele are similar to outward facingself-peptide residues that are presented by the same HLA allele, wheresimilarity can be defined by identity or defined similarity metrics suchas BLOSUM matrices (BLOSUM matrices are known in the art). Testing avaccine peptide for its ability to activate T cells that recognizeself-peptides can be experimentally accomplished by the vaccination ofanimal models followed by ELISPOT or other immunogenicity assay or withhuman tissue protocols. In both cases, models with HLA alleles thatpresent the vaccine peptide are used. In some embodiments, human primaryblood mononuclear cells (PBMCs) are stimulated with a vaccine peptide,the T cells are allowed to grow, and then T cell activation with aself-peptide is assayed as described in Tapia-Calle et al. (2019) orother methods as known in the art. In some embodiments, the vaccinepeptide is excluded from vaccine inclusion if the T cells are activatedby the self-peptide. In some embodiments, computational predictions ofthe ability of a peptide to activate T cells that also recognizeself-peptides can be utilized. These predictions can be based upon themodeling of the outward facing residues from the peptide-HLA complex andtheir interactions with other peptide residues. In some embodiments, acandidate vaccine peptide (e.g., a base peptide or a modified peptide)is eliminated from vaccine inclusion or experimentally tested forcross-reactivity if it is predicted to activate T cells that alsorecognize self-peptides based upon the structural similarity of thepeptide-MHC complex of the candidate peptide (e.g., a base peptide or amodified peptide) and the peptide-MHC complex of a self-peptide. Onemethod for the prediction of peptide-MHC structure is described by Parket al. (2013).

In some embodiments, a candidate heteroclitic vaccine peptide (e.g., amodified peptide) is eliminated from vaccine inclusion if it does notactivate T cells that recognize its corresponding base/seed targetpeptide (second round of Peptide Filtering and Scoring, FIGS. 1-2 ).Testing a candidate heteroclitic peptide (e.g., a modified peptide) forits ability to activate T cells that recognize its corresponding seed(or base) target peptide with respect to the same HLA allele can beexperimentally accomplished by the vaccination of animal models followedby ELISPOT or other immunogenicity assay or with human tissue protocols.In both cases, models with HLA alleles that present the heterocliticpeptide are used. In some embodiments, human PBMCs are stimulated withthe heteroclitic peptide, the T cells are allowed to grow, and then Tcell activation with the seed (or base) target peptide is assayed asdescribed in Tapia-Calle et al. (2019) or using other methods known inthe art. In some embodiments, computational predictions of the abilityof a heteroclitic peptide to activate T cells that also recognize thecorresponding seed (or base) target peptide can be utilized. Thesepredictions can be based upon the modeling of the outward facingresidues from the peptide-HLA complex and their interactions with otherpeptide residues. In some embodiments, the structural similarity of thepeptide-HLA complex of a heteroclitic peptide and the peptide-HLAcomplex of the corresponding seed (or base) target is used to qualifyheteroclitic peptides for vaccine inclusion or to require experimentalimmunogenicity testing before vaccine inclusion.

FIG. 3 (MHC class I) and FIG. 4 (MHC class II) show the predictedpopulation coverage of OptiVax-Robust selected single target-specificvaccines with differing number of peptides designed for the KRASmutations G12D, G12V, G12R, G12C, and G13D. FIGS. 4-5 show that as thenumber of peptides increases for a vaccine, its predicted populationcoverage increases. The population coverage shown in FIGS. 4-5 are ofthose individuals that have the specific mutation that the vaccine isdesigned to cover. An increase in peptide count will also typicallycause the average number of peptide-HLA hits in each individual toincrease within the population.

OptiVax can be used to design a vaccine to maximize thefraction/proportion of the population whose HLA molecules are predictedto bind to and display at least p peptides from the vaccine. In someembodiments, this prediction (e.g., scoring) includes experimentalimmunogenicity data to directly predict at least p peptides will beimmunogenic. The number p is input to OptiVax, and OptiVax can be runmultiple times with varying values for p to obtain a predicted optimaltarget peptide set for different peptide counts p. Larger values of pwill increase the redundancy of a vaccine at the cost of more peptidesto achieve a desired population coverage. In some embodiments, it maynot be possible to achieve a given population coverage given a specificheteroclitic base set. In some embodiments, the number p is a functionof the desired size of a vaccine.

The methods described herein can be used to design separate vaccineformulations for MHC class I and class II based immunity.

In some embodiments, this procedure is used to create a vaccine for anindividual. In some embodiments, the target peptides present in theindividual are determined by sequencing the individual's tumor RNA orDNA, and identifying mutations that produce foreign peptides. Oneembodiment of this method is described in U.S. Pat. No. 10,738,355,incorporated in its entirety herein. In some embodiments, peptidesequencing methods are used to identify target peptides in theindividual. One embodiment of this is described in U.S. Publication No.:2011/0257890. In some embodiments, the target peptides used for theindividual's vaccine are selected when a self-peptide, foreign peptide,or RNA encoding a self-peptide or foreign peptide is observed in aspecimen from the individual is present at a predetermined level. Thetarget peptides in the individual are used to construct a vaccine asdescribed in the disclosure herein. For vaccine design, OptiVax isprovided a diplotype comprising the HLA type of the individual. In analternative embodiment, the HLA type of an individual is separated intomultiple diplotypes with frequencies that sum to one, where eachdiplotype comprises one or more HLA alleles from the individual and anotation that the other allele positions should not be evaluated. Theuse of multiple diplotypes will cause OptiVax's objective function toincrease the chance that immunogenic peptides will be displayed by allof the constructed diplotypes. This achieves the objective of maximizingthe number of distinct HLA alleles in the individual that exhibitpeptide-HLA immunogenicity and thus improves the allelic coverage of thevaccine in the individual.

FIG. 10 shows the predicted vaccine performance (predicted number ofpeptide-HLA hits) of ten example G12V MHC class I vaccines for a singleindividual with the MHC class I HLA diplotype HLA-A02:03, HLA-A11:01,HLA-B55:02, HLA-B58:01, HLA-C03:02, HLA-C03:03. OptiVax was used todesign ten G12V MHC class I vaccines for this HLA diplotype with peptidecounts ranging from 1 to 10. For the results in FIG. 10 , OptiVax wasrun with six synthetic diplotypes, each equally weighted, each with oneHLA allele from the individual's HLA diplotype, and the other allelepositions marked to not be evaluated. The 10 peptide vaccine in FIG. 10comprises SEQ ID NO: 3 (GAVGVGKSL), SEQ ID NO: 4 (LMVVGAVGV), SEQ ID NO:7 (VVGAVGVGK), SEQ ID NO: 14 (GPVGVGKSV), SEQ ID NO: 69 (LMVVGAVGI), SEQID NO: 72 (LMVVGAVGL), SEQ ID NO: 131 (GAVGVGKSM), SEQ ID NO: 138(GPVGVGKSA), SEQ ID NO: 142 (VTGAVGVGK), and SEQ ID NO: 198 (VAGAVGVGM).Two peptides, SEQ ID NO: 3 (GAVGVGKSL) and SEQ ID NO: 131 (GAVGVGKSM),are predicted to bind two of the HLA alleles with an affinity of 50 nMor less.

MHC Class I Vaccine Design Procedure

In some embodiments, MHC class I vaccine design procedures consist ofthe following computations steps.

In some embodiments, the inputs for the computation are:

-   -   P_(1 . . . n): Peptide sequence (length n) containing the        neoantigen or pathogenic target(s) of interest (e.g., KRAS G12D,        KRAS G12V, KRAS G12R, KRAS G12C, KRAS G13D). P_(i) denotes the        amino acid at position i.    -   t: Position of target mutation in P, t∈[1, . . . n] (e.g., t=12        for KRAS G12D).    -   τ₁: Threshold for potential presentation of peptides by MHC for        peptide-MHC scoring (e.g., 500 nM binding affinity)    -   τ₂: Threshold for predicted display of peptides by MHC for        peptide-MHC scoring (e.g., 50 nM binding affinity)    -   : Set of HLA alleles (for HLA-A, HLA-B, HLA-C loci)    -   F:        ³→        : Population haplotype frequencies (for OptiVax optimization and        coverage evaluation).    -   N: Parameter for EvalVax and OptiVax objective function.        Specifies minimum number of predicted per-individual hits for        population coverage objective to consider the individual        covered. Default=1 (computes P(n≥1) population coverage).

In some embodiments, Peptide-HLA Scoring Functions used are:

-   -   ScorePotential: P×        →        : Scoring function mapping a (peptide, HLA allele) pair to a        prediction of peptide-HLA display. If predicted affinity ≤τ₁,        then returns 1, else returns 0. Options include MHCflurry,        NetMHCpan, PUFFIN, ensembles, or alternative metrics or software        may be used, including models calibrated against immunogenicity        data.    -   SCOREDISPLAY: P×        →        : Scoring function mapping a (peptide, HLA allele) pair to a        prediction of peptide-HLA display. If predicted affinity ≤τ₂,        then returns 1, else returns 0. Options include MHCflurry,        NetMHCpan, PUFFIN, ensembles, or alternative metrics or software        may be used, including models calibrated against immunogenicity        data.

Next, from the seed protein sequence (P), a set

of windowed native peptides spanning the protein sequence(s) isconstructed. In some embodiments, 9-mers are produced, but this processcan be performed with any desired window lengths and the resultingpeptide sets combined.

={P _(j . . . j+8) |j∈[t−8, . . . ,t],j≠{t−8,t−1}}

The second condition j≠{t−8, t−1} excludes peptides where the mutationat t is in positions P2 or P9 of the windowed 9-mer peptide (i.e., theanchor positions).

Next, each peptide sequence in

is scored against all HLA alleles in

for potential presentation using SCOREPOTENTIAL (with threshold τ₁=500nM) and store results in a |

|×|

| matrix S:S[p,h]=SCOREPOTENTIAL(p,h)∀p∈

,h∈

-   -   Note that S is a binary matrix where 1 indicates the HLA is        predicted to potentially present the peptide, and 0 indicates no        potential presentation.        Define Base Set of Peptides B⊆        :        B={p∈        |∃hs.t.S[p,h]=1}

Thus, B contains the native peptides that are predicted to bepotentially presented by at least 1 HLA.

Create a Set of all Heteroclitic Peptides B′ Stemming from Peptides inB:

$B^{\prime} = {{\bigcup\limits_{b \in B}{ANCHOR}} - {{MODIFIED}(b)}}$

-   -   where ANCHOR-MODIFIED(b) returns a set of all 399        anchor-modified peptides stemming from b (with all possible        modifications to the amino acids at P2 and P9).

Next, all heteroclitic candidate peptides (e.g., modified peptides) inB′ are scored against all HLA alleles in

for predicted display using SCOREDISPLAY (with threshold τ₂=50 nM), andstore results in binary |B′|×|

| matrix S₁′:S ₁′[b′,h]=SCOREDISPLAY(b′,h)∀b′∈B′,h∈

Next, an updated scoring matrix S₂′ is computed for heterocliticpeptides conditioned on the potential presentation of the correspondingbase peptides by each HLA:

${S_{2}^{\prime}\left\lbrack {b^{\prime},h} \right\rbrack} = \left\{ {{\begin{matrix}{{S_{1}^{\prime}\left\lbrack {b^{\prime},h} \right\rbrack},} & {{{if}{S\left\lbrack {b,h} \right\rbrack}} = 1} \\{0,} & {otherwise}\end{matrix}{\forall{b^{\prime} \in B^{\prime}}}},{h \in \mathcal{H}}} \right.$

-   -   where each heteroclitic peptide b′∈B′ is a mutation of base        peptide b∈B. This condition enforces that if h was not predicted        to potentially present b, then all heteroclitic peptides b′        derived from b will not be displayed by h (even if h would        otherwise be predicted to display b′).

In some embodiments, OptiVax-Robust is used to design a final peptideset (e.g., third peptide set) from the union of base peptides andheteroclitic peptides B∪B′ (with corresponding scoring matrices S andS₂′ for B and B′, respectively). Let

_(k) denote the compact set of vaccine peptides output by OptiVaxcontaining k peptides. Note that

_(k+1) is not necessarily a superset of

_(k). (In alternate embodiments, OptiVax can be used to augment the baseset B with peptides from B′ using scoring matrix S₂′ to return set

_(k), and the final vaccine set

_(k+|B|) consists of peptides B∪

_(k).)

In some embodiments, this procedure is repeated independently for eachtarget of interest, and the resulting independent vaccine sets can bemerged into a combined vaccine as described below.

MHC Class II Vaccine Design Procedure

In some embodiments, MHC class II vaccine design procedures consist ofthe following computations steps.

In some embodiments, the inputs for the computation are:

-   -   P_(1 . . . n): Peptide sequence(s) (length n) containing the        neoantigen(s) of interest (e.g., KRAS G12D, KRAS G12V, KRAS        G12R, KRAS G12C, KRAS G13D). P_(i) denotes the amino acid at        position i.    -   t: Position of target mutation in P, t∈[1, . . . , n] (e.g.,        t=12 for KRAS G12D).    -   τ₁: Threshold for potential presentation of peptides by MHC for        peptide-MHC scoring (e.g., 500 nM binding affinity)    -   τ₂: Threshold for predicted display of peptides by MHC for        peptide-MHC scoring (e.g., 50 nM binding affinity)    -   : Set of HLA alleles (for HLA-DR, HLA-DQ, HLA-DP loci)    -   F:        ³→        : Population haplotype frequencies (for OptiVax optimization and        coverage evaluation).    -   N: Parameter for EvalVax and OptiVax objective function.        Specifies minimum number of predicted per-individual hits for        population coverage objective to consider the individual        covered. Default=1 (computes P(n≥1) population coverage).

In some embodiments, Peptide-HLA Scoring Functions used are:

-   -   SCOREPOTENTIAL: P×        →        : Scoring function mapping a (peptide, HLA allele) pair to a        prediction of display. If predicted affinity ≤τ₁, then returns        1, else returns 0. Options include NetMHCIIpan, PUFFIN,        ensembles, or alternative metrics or software may be used,        including models calibrated against immunogenicity data.    -   SCOREDISPLAY: P×        →        : Scoring function mapping a (peptide, HLA allele) pair to a        prediction of peptide-HLA display. If predicted affinity ≤τ₂,        then returns 1, else returns 0. Options include NetMHCIIpan,        PUFFIN, ensembles, or alternative metrics or software may be        used, including models calibrated against immunogenicity data.    -   FindCore: P×        →[1, . . . , n]: Function mapping a (peptide, HLA allele) pair        to a prediction of the 9-mer binding core. The core may be        specified as the offset position (index) into the peptide where        the core begins.

Next, from the seed protein sequence (P), a set

of peptides spanning the protein sequence are constructed. Here, weextract all windowed peptides of length 13-25 spanning the targetmutation, but this process can be performed using any desired windowlengths (e.g., only 15-mers).

$\mathcal{P} = {\bigcup\limits_{k \in {\lbrack{13,\ldots,25}\rbrack}}\mathcal{P}_{k}}$

={P _(j . . . j+(k−1)) |j∈[t−(k−1), . . . ,t]}

-   -   where        _(k) contains all sliding windows of length k, which are        combined to form        . Note that here (unlike MHC class I), no peptides are excluded        based on binding core or anchor residue positions (for MHC class        II, filtering is performed as described in this disclosure).

Next, each peptide sequence in P is scored against all HLA alleles in

for potential presentation using SCOREPOTENTIAL (with threshold τ₁=500nM) and store results in |

|×|

| matrix S₁:S ₁[p,h]=SCOREPOTENTIAL(p,h)∀p∈

,h∈

-   -   Note that S₁ is a binary matrix where 1 indicates the HLA is        predicted to potentially present the peptide, and 0 indicates no        potential presentation.

For each (peptide, HLA allele) pair (p, h), identify/predict the 9-merbinding core using FINDCORE. The predicted binding core is recorded in amatrix C:C[p,h]=FINDCORE(p,h)∀p∈

,h∈

Next, an updated scoring matrix S₂ is computed for native peptides in

:

${S_{2}\left\lbrack {p,h} \right\rbrack} = \left\{ \begin{matrix}{{S_{1}\left\lbrack {p,h} \right\rbrack},} & {{if}{C\left\lbrack {p,h} \right\rbrack}{specifies}P_{t}{at}a{non} - {anchor}{postion}{inside}{core}} \\{0,} & {otherwise}\end{matrix} \right.$ ∀p ∈ 𝒫, h ∈ ℋ

-   -   where P_(t) is the target residue of interest (e.g., the        mutation site of KRAS G12D). This condition enforces the target        residue to fall within the binding core at a non-anchor position        for all (peptide, HLA allele) pairs with non-zero scores in S₂,        and allows the binding core to vary by allele per peptide (as        the binding cores of a particular peptide may differ based on        the HLA allele presenting the peptide). Thus, for each pair (p,        h), if the predicted binding core C[p, h] specifies the target        residue P_(t) at an anchor position (P1, P4, P6, or P9 of the        9-mer core), or if P_(t) is not contained within the binding        core, then S₂ [p, h]=0. In an alternate embodiment, P_(t) can be        located outside of the core or inside the core in a non-anchor        position.

Next, OptiVax-Robust is run with peptides

and scoring matrix S₂ to identify a non-redundant base set of peptidesB⊆

. (In alternate embodiments, B can be chosen as the entire set

rather than identifying a non-redundant base set.)

Next, a set of all heteroclitic peptides B′ is created stemming frompeptides in B:

$B^{\prime} = {\bigcup\limits_{b \in {\bigcup B}}\left\{ {{{{ANCHOR} - {{{MODIFIED}\left( {b,c} \right)}{\forall c}}}❘{\exists{h{s.t.{S_{2}\left\lbrack {b,h} \right\rbrack}}}}} = 1} \right\}}$

-   -   where ANCHOR-MODIFIED(b,c) returns a set of all 20⁴−1        anchor-modified peptides stemming from b with all possible        modifications to the amino acids at P1, P4, P6, and P9 of the        9-mer binding core c. Thus, for each base peptide b, the        heteroclitic set B′ contains all anchor-modified peptides b′        with modifications to all unique cores of b identified for any        HLA alleles that potentially present b with a valid core        position as indicated by scoring matrix S₂.

Next, all heteroclitic candidate peptides (e.g., modified peptides) inB′ are scored against all HLA alleles in

for predicted display using SCOREDISPLAY (with threshold τ₂=50 nM), andstore results in binary |B′|×|

| matrix S₁′:S ₁′[b′,h]=ScoreDisplay(b′,h)∀b′∈B′,h∈

For each (heteroclitic peptide, HLA allele) pair (b′,h),identify/predict the 9-mer binding core using FINDCORE. The predictedbinding core is recorded in a matrix C′:C′[b′,h]=FINDCORE(b′,h)∀b′∈B′,h∈

An updated scoring matrix S₂′ is computed for heteroclitic peptidesconditioned on the identified binding cores of a heteroclitic and basepeptides occurring at the same offset by a particular HLA:

${S_{2}^{\prime}\left\lbrack {b^{\prime},h} \right\rbrack} = \left\{ {{\begin{matrix}{{S_{1}^{\prime}\left\lbrack {b^{\prime},h} \right\rbrack},} & {{{if}{C^{\prime}\left\lbrack {b^{\prime},h} \right\rbrack}} = {C\left\lbrack {b,h} \right\rbrack}} \\{0,} & {otherwise}\end{matrix}{\forall{b^{\prime} \in B^{\prime}}}},{h \in \mathcal{H}}} \right.$

-   -   where each heteroclitic peptide b′∈B′ is a mutation of base        peptide b∈B. This condition enforces the binding core of the        heteroclitic peptide b′ to be at the same relative position as        the base peptide b, and, implicitly, enforces that the target        residue P_(t) still falls in a non-anchor position within the        9-mer binding core (Step 3).

An updated scoring matrix S₃′ is computed for heteroclitic peptidesconditioned on the potential presentation of the corresponding basepeptides by each HLA:

${S_{3}^{\prime}\left\lbrack {b^{\prime},h} \right\rbrack} = \left\{ {{\begin{matrix}{{S_{2}^{\prime}\left\lbrack {b^{\prime},h} \right\rbrack},} & {{{if}{S\left\lbrack {b,h} \right\rbrack}} = 1} \\{0,} & {otherwise}\end{matrix}{\forall{b^{\prime} \in B^{\prime}}}},{h \in \mathcal{H}}} \right.$

-   -   where each heteroclitic peptide b′∈B′ is a mutation of base        peptide b∈B. This condition enforces that if h was not predicted        to display b, then all heteroclitic peptides b′ derived from b        will not be displayed by h (even if h would otherwise be        predicted to display b′).

OptiVax-Robust is used to design a final peptide set (e.g., thirdpeptide set) from the union of base peptides and heteroclitic peptidesB∪B′ (with corresponding scoring matrices S₂ and S₃′ for B and B′,respectively). Let

_(k) denote the compact set of vaccine peptides output by OptiVaxcontaining k peptides. Note that

_(k+1) is not necessarily a superset of

_(k). (In alternate embodiments, OptiVax can be used to augment the baseset B with peptides from B′ using scoring matrix S₂′ to return set

_(k), and the final vaccine set

_(k+|B|) consists of peptides B∪

_(k).)

In some embodiments, this procedure is repeated independently for eachsingle target of interest, and the resulting independent vaccine setscan be merged into a combined vaccine as described below.

Methods for Combining Multiple Vaccines

The above described methods will produce an optimized target peptide set(e.g., third peptide set) for one or more individual targets. In someembodiments, a method is provided for designing separate vaccines forMHC class I and class II based immunity for multiple targets (e.g., twoor more targets such as KRAS G12D and KRAS G12V).

In some embodiments, a method is disclosed for producing a combinedpeptide vaccine for multiple targets by using a table of presentationsfor a disease that is based upon empirical data from sources such as theCancer Genome Atlas (TCGA). FIG. 5 shows one embodiment for factoringdisease presentation type probabilities (pancreatic cancer, colon/rectumcancer, and bronchus/lung cancer) by probability, for each diseasepresentation, of target presented for various KRAS mutation targets(KRAS G12D, KRAS G12V, and KRAS G12R). A presentation is a unique set oftargets that are presented by one form of a disease (e.g., distinct typeof cancer as shown in FIG. 5 ). For each presentation, FIG. 5 shows anexample of the probability of that presentation, and the probabilitythat a given target is observed. For a given presentation, there can beone or more targets, each having a probability. In some embodiments, themethod for multi-target vaccine design will allocate peptide resourcesfor inducing disease immunity based on the presentation and respectivetarget probabilities as shown in FIG. 5 , for example. In someembodiments, presentations correspond to the prevalence of targets indifferent human populations or different risk groups. The probability ofa target in a population is computed by summing for each possiblepresentation the probability of that presentation times the probabilityof the target in that presentation.

Referring to FIG. 6 , in some embodiments, the method first includesdesigning an individual peptide vaccine for each target to create acombined vaccine design for multiple targets. This initially results insets of target-specific vaccine designs. In some embodiments, themarginal predicted vaccine performance of each target-specific vaccineat size k is defined by predicted vaccine performance at size k minusthe predicted vaccine performance of the vaccine at size k minus one(see FIGS. 3-4 ). The composition of a vaccine may change as the numberof peptides used in the vaccine increases, and thus for computingcontributions to a combined vaccine the marginal predicted vaccineperformance of each target-specific vaccine is used instead of aspecific set of peptides.

In some embodiments, the weighted marginal predicted vaccine performanceof a target-specific vaccine design for each target specific vaccinesize is computed as shown in FIG. 6 . For a given target specificvaccine size, its weighted predicted vaccine performance is computed bymultiplying its predicted vaccine performance times the probability ofthe target in the population (e.g., by using values as shown in FIG. 5). The marginal weighted predicted vaccine performance for a targetspecific vaccine is its weighted coverage at size k minus its coverage asize k minus one (e.g., see FIGS. 3-4 ). The marginal weighted predictedvaccine performance of a target specific vaccine of size one is itsweighted predicted vaccine performance. The marginal weighted predictedvaccine performances for all vaccines are combined into a single list,and the combined list is sorted from largest to least by the weightedmarginal predicted vaccine performances of the target specific vaccinesas shown in FIG. 6 . The combined vaccine of size n is then determinedby the first n elements of this list. The peptides for the combinedvaccine are determined by the individual peptide target vaccines whosesizes add to n and whose weighted predicted vaccine performances sums tothe same sum as the first n elements of the sorted list. This maximizesthe predicted vaccine performance of the combined vaccine of size n.

In some embodiments, the combined multiple target vaccine can bedesigned on its overall predicted coverage for the disease describeddepending on the presentation table used (e.g., see FIG. 5 ), by itspredicted coverage for a specific indication, and/or by its predictedcoverage for a specific target by adjusting the weighting used forpredicted vaccine performance accordingly. Once a desired level ofcoverage is selected, the peptides of the combined vaccine aredetermined by the contributions of target-specific designs. For example,if the combined vaccine includes a target-specific vaccine of size k,then the vaccine peptides for this target at size k are used in thecombined vaccine.

As an example of one embodiment, FIG. 5 shows three mutations (KRASG12D, G12V, and G12R) and their respective probabilities of occurring inan individual with pancreatic cancer. FIG. 3 (MHC class I) and FIG. 4(MHC class II) show the population coverage of target-specific vaccinesfor the KRAS G12D, G12V, G12R, G12C, and G13D targets using the methodsfor vaccines described herein. The marginal population coverage of eachtarget-specific vaccine at a given vaccine size is the improvement incoverage at that size and the size less one. The coverage with nopeptides is zero. The marginal coverage of each target-specific vaccineis multiplied by the probability of the target in the population asdetermined by the proportions as shown in FIG. 5 for the pancreas(pancreatic cancer). These weighted marginal coverages of alltarget-specific vaccines are sorted to determine the besttarget-specific compositions, and the resulting list describes thecomposition of a combined vaccine at each size k by taking the first kelements of the list. As an example of one embodiment, FIG. 7 (MHC ClassI) and FIG. 8 (MHC Class II) show the target specific contributions ateach vaccine size for a combined KRAS vaccine for the three mutationsKRAS G12D, G12V, and G12R. The methods for combined vaccine protocoldescribed herein was used to compute the examples in FIGS. 7 and 8 . Ateach combined vaccine size, different components of the target-specificvaccines are utilized. Table 1 (below) contains the peptides present inindependent (single target) and combined (multiple target) MHC class Ivaccine designs for the KRAS G12D, G12V, G12R, G12C, and G13D targets.Table 2 (below) contains the contains the peptides present inindependent (single target) MHC class II vaccine designs for the KRASG12D, G12V, G12R, G12C, and G13D targets, and any subset of theindividual/single target vaccines can be combined to create an MHC classII vaccine for two or more multiple targets. For alternate embodiments,Sequence Listing provides heteroclitic peptides useful in MHC class Ivaccines for the KRAS G12D, G12V, G12R, G12C, and G13D targets.

Combined Vaccine Design Procedure

In some embodiments, the procedure described herein is used to combineindividual compact vaccines optimized for different targets into asingle optimized combined vaccine.

In some embodiments, the computational inputs for the procedure are:

-   -   : Set of neoantigen or pathogenic targets of interest (e.g.,        KRAS G12D, KRAS G12V, KRAS G12R)    -   : Vaccine sets optimized individually for each target. Let        _(t,k) denote the optimal vaccine set of exactly k peptides for        target t∈        (e.g., as computed by the procedures describe above). Note that        _(t,k+1) may not necessarily be a superset of        _(t,k).    -   W:        →[0,1]: Target weighting function mapping each target t∈        to a probability or weight of t in a particular presentation of        interest (e.g., pancreatic cancer; see Exhibit A, Table 1 for        example).    -   POPULATIONCOVERAGE:        →[0,1]: Function mapping a peptide set into population coverage        (e.g., EvalVax). This function may also take as input additional        parameters, including HLA haplotype frequencies and a minimum        per-individual number of peptide-HLA hits N (here, we compute        coverage as P(n≥1) using EvalVax-Robust).

For each target t (individually) and vaccine size (peptide count) k, theunweighted population coverage c_(t,k) is computed:c _(t,k)=PopulationCoverage(

_(t,k))∀t∈

,k

-   -   Note that for each target t, c_(t,k) is generally monotonically        increasing and concave down for increasing values of k (each        additional peptide increases coverage but with decreasing        returns).

For each target t (individually), the marginal coverage m_(t,k) iscomputed of the k-th peptide added to the vaccine set:

$m_{t,k} = \left\{ {{\begin{matrix}c_{t,k} & {{{if}k} = 1} \\{{c_{t,k} - c_{t,{k - 1}}},} & {otherwise}\end{matrix}{\forall{t \in \mathcal{T}}}},k} \right.$

-   -   Note that for each target t, m_(t,k) should be a monotonically        decreasing function in k (by Step 1 above).

The weighted marginal population coverage {tilde over (m)}_(t,k) iscomputed using weights of each target in W:{tilde over (m)} _(t,k) =W(t)·m _(t,k) ∀t∈

,k

-   -   The weighted marginal population coverage gives the effective        marginal coverage of the k-th peptide in the vaccine weighted by        the prevalence of the target in the presentation (by        multiplication with the probability/weight of the target in the        presentation).

The individual vaccines are combined into a combined vaccine via theMERGEMULTI procedure called on the weighted marginal population coveragelists {tilde over (m)}_(t)=[{tilde over (m)}_(t,k), k∈1, 2, . . . ].FIG. 9 shows an example Python implementation of the MERGEMULTIfunction. This procedure takes as input multiple sorted (descending)lists and merges them into a single sorted (descending) list. Let Mindicate the output of MERGEMULTI where each element M_(k) contains boththe marginal weighted coverage and source (target) of the k-th peptidein the combined vaccine. The combined vaccine contains peptides fromdifferent targets. In particular, the combined vaccine with k peptidescontains C_(t,k)=Σ_(j≤k)

{M_(k) from t} peptides from target t. Note that C_(t,k)∈[0, . . . k]and Σ_(t)C_(t,k)=k (C_(t,k) gives the distribution of the k peptides inthe combined vaccine across the targets).

The optimal combined vaccine set

_(k) is defined as:

${\overset{\hat{}}{v}}_{k} = {\bigcup\limits_{t \in \mathcal{T}}v_{t,C_{t,k}}}$

Thus, the combined vaccine with k peptides is the combination of theoptimal individual (C_(t,k))-peptide vaccines. The marginal weightedcoverage values of the combine vaccine M_(k) can be cumulatively summedover k to give the overall effective (target-weighted) populationcoverage of the combined vaccine containing k peptides as Σ_(j≤K)M_(k)(taking into account both the probabilities/weights of the targets inthe presentation and the expected population coverage of peptides basedon HLA display). The final vaccine size k can vary based upon thespecific population coverage goals of the vaccine.

MHC Class I Peptide Sequences

In some embodiments, a peptide vaccine (single target or combinedmultiple target vaccine) comprises about five, ten, or twenty MHC classI peptides with each peptide consisting of 8 or more amino acids. Insome embodiments, an MHC class I peptide vaccine is intended for one ormore of the KRAS G12D, G12V, and G12R targets. In some embodiments, theamino acid sequence of a first peptide in a five-peptide combinedvaccine comprises SEQ ID NO: 1. GADGVGKSM (SEQ ID NO: 1). In someembodiments, the amino acid sequence of a second peptide in afive-peptide combined vaccine comprises SEQ ID NO: 2. LMVVGADGV (SEQ IDNO: 2). In some embodiments, the amino acid sequence of a third peptidein a five-peptide combined vaccine comprises SEQ ID NO: 3. GAVGVGKSL(SEQ ID NO: 3). In some embodiments, the amino acid sequence of a fourthpeptide in a five-peptide combined vaccine comprises SEQ ID NO: 4.LMVVGAVGV (SEQ ID NO: 4). In some embodiments, the amino acid sequenceof a fifth peptide in a five-peptide combined vaccine comprises SEQ IDNO: 5. VTGARGVGK (SEQ ID NO: 5). An example combined vaccine for theKRAS G12D, G12V, and G12R targets with five peptides (SEQ ID NO: 1 toSEQ ID NO: 5) is predicted to have a weighted population coverage of0.3620.

In some embodiments, any one of the peptides (peptides 1-5) in thefive-peptide vaccine comprise an amino acid sequence 80, 81, 82, 83, 84,85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98 or 99% identicalto SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, or SEQ ID NO:5.

In some embodiments, the amino acid sequence of peptides 1 to 5 in aten-peptide combined vaccine comprise SEQ ID NO: 1, SEQ ID NO: 2, SEQ IDNO: 3, SEQ ID NO: 4, and SEQ ID NO: 5. In some embodiments, the aminoacid sequence of a sixth peptide in a ten-peptide combined vaccinecomprises SEQ ID NO: 6. VMGAVGVGK (SEQ ID NO: 6). In some embodiments,the amino acid sequence of a seventh peptide in a ten-peptide combinedvaccine comprises SEQ ID NO: 7. VVGAVGVGK (SEQ ID NO: 7). In someembodiments, the amino acid sequence of an eight peptide in aten-peptide combined vaccine comprises SEQ ID NO: 8. GARGVGKSY (SEQ IDNO: 8). In some embodiments, the amino acid sequence of a ninth peptidein a ten-peptide combined vaccine comprises SEQ ID NO: 9. GPRGVGKSA (SEQID NO: 9). In some embodiments, the amino acid sequence of a tenthpeptide in a ten-peptide combined vaccine comprises SEQ ID NO: 10.LMVVGARGV (SEQ ID NO: 10). An example combined vaccine for the KRASG12D, G12V, and G12R targets with ten peptides (SEQ ID NO: 1 to SEQ IDNO: 10) is predicted to have a weighted population coverage of 0.4374.

In some embodiments, any one of the peptides (peptides 1-10) in theten-peptide vaccine comprise an amino acid sequence 80, 81, 82, 83, 84,85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98 or 99% identicalto SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5,SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, or SEQ ID NO:10.

In some embodiments, the amino acid sequence of peptides 1 to 10 in atwenty-peptide combined vaccine comprise SEQ ID NO: 1, SEQ ID NO: 2, SEQID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ IDNO: 8, SEQ ID NO: 9, and SEQ ID NO: 10. In some embodiments, the aminoacid sequence of an 11^(th) peptide in a twenty-peptide combined vaccinecomprises SEQ ID NO: 11. GADGVGKSL (SEQ ID NO: 11). In some embodiments,the amino acid sequence of a 12^(th) peptide in a twenty-peptidecombined vaccine comprises SEQ ID NO: 12. GADGVGKSY (SEQ ID NO: 12). Insome embodiments, the amino acid sequence of a 13^(th) peptide in atwenty-peptide combined vaccine comprises SEQ ID NO: 13. GYDGVGKSM (SEQID NO: 13). In some embodiments, the amino acid sequence of a 14^(th)peptide in a twenty-peptide combined vaccine comprises SEQ ID NO: 14.GPVGVGKSV (SEQ ID NO: 14). In some embodiments, the amino acid sequenceof a 15^(th) peptide in a twenty-peptide combined vaccine comprises SEQID NO: 15. LTVVGAVGV (SEQ ID NO: 15). In some embodiments, the aminoacid sequence of a 16^(th) peptide in a twenty-peptide combined vaccinecomprises SEQ ID NO: 16. VVGAVGVGR (SEQ ID NO: 16). In some embodiments,the amino acid sequence of a 17^(th) peptide in a twenty-peptidecombined vaccine comprises SEQ ID NO: 17. GARGVGKSM (SEQ ID NO: 17). Insome embodiments, the amino acid sequence of an 18^(th) peptide in atwenty-peptide combined vaccine comprises SEQ ID NO: 18. GPRGVGKSV (SEQID NO: 18). In some embodiments, the amino acid sequence of a 19^(th)peptide in a twenty-peptide combined vaccine comprises SEQ ID NO: 19.LLVVGARGV (SEQ ID NO: 19). In some embodiments, the amino acid sequenceof a 20^(th) peptide in a twenty-peptide combined vaccine comprises SEQID NO: 20. VAGARGVGM (SEQ ID NO: 20). An example combined vaccine forthe KRAS G12D, G12V, and G12R targets with twenty peptides (SEQ ID NO: 1to SEQ ID NO: 20) is predicted to have a weighted population coverage of0.4604.

In some embodiments, any one of the peptides (peptides 1-20) in thetwenty-peptide vaccine comprise an amino acid sequence 80, 81, 82, 83,84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98 or 99%identical to SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, andSEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO:14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ IDNO: 19, or SEQ ID NO: 20

Table 1 shows MHC class I peptide sequences described herein includingthe respective SEQ ID NO, amino acid sequence corresponding to the SEQID NO, KRAS protein target (with specific mutation), the seed amino acidsequence (i.e., the amino acid sequence of the wild type KRAS fragment),the amino acid substitution (if any) for heteroclitic peptides atpositions 2 and 9, and notes detailing embodiments in which the peptidemay be included in a 5, 10, or 20 combined peptide vaccine as describedherein. Table 1 also includes additional peptide sequences comprisingSEQ ID NOs: 21-41. In some embodiments, any combination of peptideslisted in Table 1 (SEQ ID NOs: 1-41) may be used to create a combinedpeptide vaccine having between about 2 and about 40 peptides. In someembodiments, any one of the peptides (peptides 1-41; SEQ ID NOs: 1-41)in the combined vaccine comprises an amino acid sequence 80, 81, 82, 83,84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98 or 99%identical to any of SEQ ID NOs: 1-41.

TABLE 1 Example KRAS Vaccine Peptides (MHC class I) SEQ SequenceHeteroclitic Heteroclitic ID corresponding Modification Modification NOto SEQ ID Target Seed P2 P9 Note SEQ ID GADGVGKSM KRAS GADGVGKSA — A9MIndividual KRAS NO: 1 G12D G12D (MHCflurry); Combined (5 peptide)(MHCflurry); Combined (10 peptide) (MHCflurry); Combined (20peptide) (MHCflurry) SEQ ID LMVVGADGV KRAS LVVVGADGV V2M —Individual KRAS NO: 2 G12D G12D (MHCflurry); Individual KRASG12D (NetMHCpan); Combined (5 peptide) (MHCflurry); Combined (10peptide) (MHCflurry); Combined (20 peptide) (MHCflurry) SEQ ID GAVGVGKSLKRAS GAVGVGKSA — A9L Individual KRAS NO: 3 G12V G12V (MHCflurry);Combined (5 peptide) (MHCflurry); Combined (10 peptide) (MHCflury);Combined (20 peptide) (MHCflurry) SEQ ID LMVVGAVGV KRAS LVVVGAVGV V2M —Individual KRAS NO: 4 G12V G12V (MHCflurry); Individual KRASG12V (NetMHCpan); Combined (5 peptide) (MHCflury); Combined (10peptide) (MHCflury); Combined (20 peptide) (MHCflurry) SEQ ID VTGARGVGKKRAS VVGARGVGK V2T — Individual KRAS NO: 5 G12R G12R (MHCflurry);Combined (5 peptide) (MHCflury); Combined (10 peptide) (MHCflury);Combined (20 peptide) (MHCflurry) SEQ ID VMGAVGVGK KRAS VVGAVGVGK V2M —Individual KRAS NO: 6 G12V G12V (MHCflurry); Combined (10peptide) (MHCflury); Combined (20 peptide) (MHCflurry) SEQ ID VVGAVGVGKKRAS VVGAVGVGK — — Individual KRAS NO: 7 G12V G12V (MHCflurry);Individual KRAS G12V (NetMHCpan); Combined (10 peptide) (MHCflury);Combined (20 peptide) (MHCflurry) SEQ ID GARGVGKSY KRAS GARGVGKSA — A9YIndividual KRAS NO: 8 G12R G12R (MHCflurry); Combined (10peptide) (MHCflury); Combined (20 peptide) (MHCflurry) SEQ ID GPRGVGKSAKRAS GARGVGKSA A2P — Individual KRAS NO: 9 G12R G12R (MHCflurry);Combined (10 peptide) (MHCflury); Combined (20 peptide) (MHCflurry)SEQ ID LMVVGARGV KRAS LVVVGARGV V2M — Individual KRAS NO: 10 G12RG12R (MHCflurry); Individual KRAS G12R (NetMHCpan); Combined (10peptide) (MHCflury); Combined (20 peptide) (MHCflurry) SEQ ID GADGVGKSLKRAS GADGVGKSA — A9L Individual KRAS NO: 11 G12D G12D (MHCflurry);Combined (20 peptide) (MHCflurry) SEQ ID GADGVGKSY KRAS GADGVGKSA — A9YIndividual KRAS NO: 12 G12D G12D (MHCflurry); Combined (20peptide) (MHCflurry) SEQ ID GYDGVGKSM KRAS GADGVGKSA A2Y A9MIndividual KRAS NO: 13 G12D G12D (MHCflurry); Combined (20peptide) (MHCflurry) SEQ ID GPVGVGKSV KRAS GAVGVGKSA A2P A9VCombined (20 NO: 14 G12V peptide) (MHCflurry) SEQ ID LTVVGAVGV KRASLVVVGAVGV V2T — Individual KRAS NO: 15 G12V G12V (NetMHCpan);Combined (20 peptide) (MHCflury) SEQ ID VVGAVGVGR KRAS VVGAVGVGK — K9RIndividual KRAS NO: 16 G12V G12V (MHCflurry); Individual KRASG12V (NetMHCpan); Combined (20 peptide) (MHCflurry) SEQ ID GARGVGKSMKRAS GARGVGKSA — A9M Combined (20 NO: 17 G12R peptide) (MHCflurry)SEQ ID GPRGVGKSV KRAS GARGVGKSA A2P A9V Combined (20 NO: 18 G12Rpeptide) (MHCflurry) SEQ ID LLVVGARGV KRAS LVVVGARGV V2L —Individual KRAS NO: 19 G12R G12R (NetMHCpan); Combined (20peptide) (MHCflurry) SEQ ID VAGARGVGM KRAS VVGARGVGK V2A K9MIndividual KRAS NO: 20 G12R G12R (MHCflurry); Combined (20peptide) (MHCflurry) SEQ ID LTVVGADGV KRAS LVVVGADGV V2T —Individual KRAS NO: 21 G12D G12D (NetMHCpan) SEQ ID LLVVGADGV KRASLVVVGADGV V2L — Individual KRAS NO: 22 G12D G12D (NetMHCpan) SEQ IDLMVVGADGL KRAS LVVVGADGV V2M V9L Individual KRAS NO: 23 G12DG12D (NetMHCpan) SEQ ID VMGAVGVGR KRAS VVGAVGVGK V2M K9R Individual KRASNO: 24 G12V G12V (NetMHCpan) SEQ ID VMGARGVGK KRAS VVGARGVGK V2M —Individual KRAS NO: 25 G12R G12R (NetMHCpan) SEQ ID GACGVGKSL KRASGACGVGKSA — A9L Individual KRAS NO: 26 G12C G12C (MHCflurry) SEQ IDLMVVGACGV KRAS LVVVGACGV V2M — Individual KRAS NO: 27 G12CG12C (MHCflurry); Individual KRAS G12C (NetMHCpan) SEQ ID LTVVGACGV KRASLVVVGACGV V2T — Individual KRAS NO: 28 G12C G12C (MHCflurry);Individual KRAS G12C (NetMHCpan) SEQ ID VTGACGVGK KRAS VVGACGVGK V2T —Individual KRAS NO: 29 G12C G12C (MHCflurry) SEQ ID VVGACGVGR KRASVVGACGVGK — K9R Individual KRAS NO: 30 G12C G12C (MHCflurry) SEQ IDAADVGKSAM KRAS AGDVGKSAL G2A L9M Individual KRAS NO: 31 G13DG13D (MHCflurry); Individual KRAS G13D (NetMHCpan) SEQ ID AEDVGKSAM KRASAGDVGKSAL G2E L9M Individual KRAS NO: 32 G13D G13D (MHCflurry) SEQ IDAYDVGKSAM KRAS AGDVGKSAL G2Y L9M Individual KRAS NO: 33 G13DG13D (MHCflurry) SEQ ID DAGKSALTV KRAS DVGKSALTI V2A 19V Individual KRASNO: 34 G13D G13D (MHCflurry) SEQ ID GAGDVGKSM KRAS GAGDVGKSA — A9MIndividual KRAS NO: 35 G13D G13D (MHCflurry) SEQ ID LQVVGACGV KRASLVVVGACGV V2Q — Individual KRAS NO: 36 G12C G12C (NetMHCpan) SEQ IDVMGACGVGK KRAS VVGACGVGK V2M — Individual KRAS NO: 37 G12CG12C (NetMHCpan) SEQ ID VMGACGVGR KRAS VVGACGVGK V2M K9R Individual KRASNO: 38 G12C G12C (NetMHCpan) SEQ ID AADVGKSAL KRAS AGDVGKSAL G2A —Individual KRAS NO: 39 G13D G13D (NetMHCpan) SEQ ID ASDVGKSAL KRASAGDVGKSAL G2S — Individual KRAS NO: 40 G13D G13D (NetMHCpan) SEQ IDASDVGKSAM KRAS AGDVGKSAL G2S L9M Individual KRAS NO: 41 G13DG13D (NetMHCpan)

Additional amino acid sequences of MHC class I heteroclitic peptides areprovided in Sequence Listings (SEQ ID NOs: 67-1522). In someembodiments, any combination of MHC class I peptides disclosed herein(SEQ ID NOs: 1-41 and 67-1522) may be used to create a combined peptidevaccine having between about 2 and about 40 peptides. In someembodiments, any one of the peptides (SEQ ID NOs: 1-41 and 67-1522) inthe combined vaccine comprises or contains an amino acid sequence 80,81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98or 99% identical to any of SEQ ID NOs: 1-41 or 67-1522.

MHC Class II Peptide Sequences

In some embodiments, a peptide vaccine (single target or combinedmultiple target vaccine) comprises about 2 to 40 MHC class II peptideswith each peptide consisting of about 20 amino acids. In someembodiments, an MHC class II peptide vaccine is intended for one or moreof the KRAS G12D, G12V, G12R, G12C, and G13D targets.

Table 2 summarizes MHC class II peptide sequences described hereinincluding the respective SEQ ID NO, amino acid sequence corresponding tothe SEQ ID NO, the amino acid sequence corresponding to the peptide'sbinding core, the KRAS protein target (with specific mutation), the seedamino acid sequence (i.e., the amino acid sequence of the wild type KRASfragment), the seed amino acid sequence of the binding core, and theamino acid substitution (if any) for heteroclitic peptides at positions1, 4, 6, and 9. Table 2 includes peptide sequences comprising SEQ IDNOs: 42-66. SEQ ID NOs: 42-65 (Table 2) encode for recombinant peptides.In some embodiments, any combination of peptides listed in Table 2 (SEQID NOs: 42-66) may be used to create a single target (individual) orcombined peptide vaccine having between about 2 and about 40 peptides.In some embodiments, any one of the peptides (peptides 42-66; SEQ IDNOs: 42-66) in the combined vaccine comprises an amino acid sequence 80,81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98or 99% identical to any of SEQ ID NOs: 42-66.

TABLE 2 Example KRAS Vaccine Peptides (MHC class II) Hetero- Hetero-Hetero- Hetero- Sequence clitic clitic clitic clitic SEQ correspondingModifi- Modifi- Modifi- Modifi- ID  to Seed cation cation cation cationNO SEQ ID Core Target Seed Core P1 P4 P6 P9 Note SEQ EYKFVVFGSDGAG FVVFGKRAS EYKLVVVGADGVG LVVVGADGV L1F V4F A6S V9A Indi- ID KS SDGA G12D KSvidual NO: KRAS 42 G12D (NetM HCIIpan) SEQ EYKFVVIGNDGAG FWIG KRASEYKLVVVGADGVG LVVVGADGV L1F V4I A6N V9A Indi- ID KSALTIQLIQN NDGA G12DKSALTIQLIQN vidual NO: KRAS 43 G12D (NetM HCIIpan) SEQ EYKFVVLGADGAGFVVLG KRAS EYKLVVVGADGVG LVVVGADGV L1F V4L — V9A Indi- ID KS ADGA G12DKS vidual NO: KRAS 44 G12D (NetM HCIIpan) SEQ MTEYKFVVSGADG FVVSG KRASMTEYKLVVVGADG LVVVGADGV L1F V4S — V9I Indi- ID IGKSALT ADGI G12D VGKSALTvidual NO: KRAS 45 G12D (NetM HCIIpan) SEQ MTEYKFVVYGSDG FVVYG KRASMTEYKLVVVGADG LVVVGADGV L1F V4Y A6S V9I Indi- ID IGKSALT SDGI G12DVGKSALT vidual NO: KRAS 46 G12D (NetM HCIIpan) SEQ EYKFVVIGRVGHG FVVIGKRAS EYKLVVVGAVGVG LVVVGAVGV L1F V41 A6R V9H Indi- ID KS RVGH G12V KSvidual NO: KRAS 47 G12V (NetM HCIIpan) SEQ EYKFVVLGTVGHG FVVLG KRASEYKLVVVGAVGVG LVVVGAVGV L1F V4L A6T V9H Indi- ID KS TVGH G12V KS vidualNO: KRAS 48 G12V (NetM HCIIpan) SEQ EYKFVVYGNVGM FVVYG KRASEYKLVVVGAVGVG LVVVGAVGV L1F V4Y A6N V9M Indi- ID GKS NVGM G12V KS vidualNO: KRAS 49 G12V (NetM HCIIpan) SEQ EYKIVVAGNVGIGK IVVAG KRASEYKLVVVGAVGVG LVVVGAVGV L1I V4A A6N V9I Indi- ID S NVGI G12V KS vidualNO: KRAS 50 G12V (NetM HCIIpan) SEQ TEYKIVVMGNVGY IVVMG KRASTEYKLVVVGAVGV LVVVGAVGV L1I V4M A6N V9Y Indi- ID GK NVGY G12V GK vidualNO: KRAS 51 G12V (NetM HCIIpan) SEQ MTEYKFVVFGSRG FVVFG KRASMTEYKLVVVGARG LVVVGARGV L1F V4F A6S — Indi- ID VGKSALT SRGV G12R VGKSALTvidual NO: KRAS 52 G12R (NetM HCIIpan) SEQ MTEYKFVVIGNRG FVVIG KRASMTEYKLVVVGARG LVVVGARGV L1F V4I A6N — Indi- ID VGKSALT NRGV G12R VGKSALTvidual NO: KRAS 53 G12R (NetM HCIIpan) SEQ MTEYKFVVIGVRG FVVIG KRASMTEYKLVVVGARG LVVVGARGV L1F V4I A6V V9D Indi- ID DGKSALT VRGD G12RVGKSALT vidual NO: KRAS 54 G12R (NetM HCIIpan) SEQ MTEYKFVVMGSRG FVVMKRAS MTEYKLVVVGARG LVVVGARGV L1F V4M A6S V9A Indi- ID AGKSALT GSRGA G12RVGKSALT vidual NO: KRAS 55 G12R (NetM HCIIpan) SEQ VVVIARGVPKSLLT IARGVKRAS VVVGARGVGKSAL GARGVGKSA G1I — G6P A9L Indi- ID I PKSL G12R TIvidual NO: KRAS 56 G12R (NetM HCIIpan) SEQ EYKFVVFGNCGAG FVVFG KRASEYKLVVVGACGVG LVVVGACGV L1F V4F A6N V9A Indi- ID KS NCGA G12C KS vidualNO: KRAS 57 G12C (NetM HCIIpan) SEQ EYKFVVSGACGVG FVVSG KRASEYKLVVVGACGVG LVVVGACGV L1F V4S — — Indi- ID KS ACGV G12C KS vidual NO:KRAS 58 G12C (NetM HCIIpan) SEQ EYKFVVSGNCGLG FVVSG KRAS EYKLVVVGACGVGLVVVGACGV L1F V4S A6N V9L Indi- ID KS NCGL G12C KS vidual NO: KRAS 59G12C (NetM HCIIpan) SEQ EYKLVVMGPCGAG LVVM KRAS EYKLVVVGACGVG LVVVGACGV— V4M A6P V9A Indi- ID KS GPCGA G12C KS vidual NO: KRAS 60 G12C (NetMHCIIpan) SEQ KLVIVGICKVGHSA IVGICK KRAS KLVVVGACGVGKS VVGACGVGK VII A4IG6K K9H Indi- ID L VGH G12C AL vidual NO: KRAS 61 G12C (NetM HCIIpan)SEQ EYKFVVFGNGDLG FVVFG KRAS EYKLVVVGAGDVG LVVVGAGDV L1F V4F A6N V9LIndi- ID KS NGDL G13D KS vidual NO: KRAS 62 G13D (NetM HCIIpan) SEQEYKFVVMGNGDSG FVVM KRAS EYKLVVVGAGDVG LVVVGAGDV L1F V4M A6N V9S Indi- IDKS GNGDS G13D KS vidual NO: KRAS 63 G13D (NetM HCIIpan) SEQEYKFVVSGSGDVG FVVSG KRAS EYKLVVVGAGDVG LVVVGAGDV L1F V4S A6S Indi- ID KSSGDV G13D KS vidual NO: KRAS 64 G13D (NetM HCIIpan) SEQ EYKIVVMGRGDMGIVVMG KRAS EYKLVVVGAGDVG LVVVGAGDV L1I V4M A6R V9M Indi- ID KS RGDM G13DKS vidual NO: KRAS 65 G13D (NetM HCIIpan) SEQ YKLVVVGAGDVG — KRAS — — —— — — Indi- ID KSA G13D vidual NO: KRAS 66 G13D (NetM HCIIpan)

In some embodiments, any combination of MHC class I and/or MHC class IIpeptides disclosed herein (SEQ ID NOs: 1-1522) may be used to create asingle target (individual) or combined peptide vaccine having betweenabout 2 and about 40 peptides. In some embodiments, any one of thepeptides (peptides 1-1522; SEQ ID NOs: 1-1522) in the combined vaccinecomprises an amino acid sequence 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,90, 91, 92, 93, 94, 95, 96, 97, 98 or 99% identical to any of SEQ IDNOs: 1-1522.

mRNA and DNA Vaccines

In some embodiments, vaccine peptides are encoded as mRNA or DNAmolecules and are administered for expression in vivo as is known in theart. One example of the delivery of vaccines by mRNA is found in Kranzet al. (2016), incorporated herein by reference. In one embodiment, aconstruct comprises 10 peptides, including a five-peptide MHC class Icombined pancreatic cancer vaccine (targets: KRAS G12D, G12V, G12R) anda five-peptide MHC class II combined pancreatic cancer vaccine (targets:KRAS G12D, G12V, G12R), as optimized by the procedure described herein.Peptides are prepended with a secretion signal sequence at theN-terminus and followed by an MHC class I trafficking signal (MITD)(Kreiter et al., 2008; Sahin et al., 2017). The MITD has been shown toroute antigens to pathways for HLA class I and class II presentation(Kreiter et al., 2008). Here we combine all peptides of each MHC classinto a single construct using non-immunogenic glycine/serine linkersfrom Sahin et al. (2017), though it is also plausible to constructindividual constructs containing single peptides with the same secretionand MITD signals as demonstrated by Kreiter et al. (2008).

In some embodiments, the amino acid sequence encoded by the mRNA vaccinecomprises SEQ ID NO: 1523. Underlined amino acids correspond to thesignal peptide (or leader) sequence. Bolded amino acids correspond toMHC class I (9 amino acids in length; 5 peptides) and MHC class II(13-25 amino acids in length; 5 peptides) peptide sequences. Italicizedamino acids correspond to the trafficking signal.MRVTAPRTLILLLSGALALTETWAGSGGSGGGGSGGGADGVGKSMGGSGGGGSGGLMVVGADGVGGSGGGGSGGGAVGVGKSLGGSGGGGSGGLMVVGAVGVGGSGGGGSGGVTGARGVGKGGSGGGGSGGEYKFVVLGTVGHGKSGGSGGGGSGGEYKIVVAGNVGIGKSGGSGGGGSGGEYKFVVFGSDGAGKSGGSGGGGSGGMTEYKFVVSGADGIGKSALTGGSGGGGSGGMTEYKFVVIGNRGVGKSALTGGSLGGGGSGIVGIVAGLAVLAVVVIGAVVATVMCRRKSSGGKGGSYSQAASSDSAQGSDVSLTA (SEQ ID NO: 1523).

In some embodiments, the vaccine is an mRNA vaccine comprising a nucleicacids sequence encoding the amino acid sequence consisting of SEQ ID NO:1523. In some embodiments, the nucleic acid sequence of the mRNA vaccineencodes for an amino acid sequence 80, 81, 82, 83, 84, 85, 86, 87, 88,89, 90, 91, 92, 93, 94, 95, 96, 97, 98 or 99% identical to SEQ ID NO:1523.

In some embodiments, the vaccine is a DNA vaccine comprising a nucleicacids sequence encoding the amino acid sequence consisting of SEQ ID NO:1523. In some embodiments, the nucleic acid sequence of the DNA vaccineencodes for an amino acid sequence 80, 81, 82, 83, 84, 85, 86, 87, 88,89, 90, 91, 92, 93, 94, 95, 96, 97, 98 or 99% identical to SEQ ID NO:1523.

In some embodiments, one or more MHC class I and/or MHC class IIpeptides disclosed herein (SEQ ID NO: 1-1522) can be encoded in one ormore mRNA or DNA molecules and administered for expression in vivo. Insome embodiments between about 2 and about 40 peptide sequences areencoded in one or more mRNA constructs. In some embodiments, betweenabout 2 and about 40 peptide sequences are encoded in one or more DNAconstructs (i.e., nucleic acids encoding the amino acids sequencescomprising on or more of SEQ ID NOs: 1-1522). In some embodiments, theamino acid sequence of the mRNA vaccine or the nucleic acid sequence ofthe DNA vaccine encodes for an amino acid sequence 80, 81, 82, 83, 84,85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98 or 99% identicalto any of SEQ ID NOs: 1-1522.

Non-Limiting Embodiments of the Subject Matter

In one aspect, the invention provides for a system for selecting animmunogenic peptide composition comprising a processor, and a memorystoring processor-executable instructions that, when executed by theprocessor, cause the processor to create a first peptide set byselecting a plurality of base peptides, wherein at least one peptide ofthe plurality of base peptides is associated with a disease, create asecond peptide set by adding to the first peptide set a modifiedpeptide, wherein the modified peptide comprises a substitution of atleast one residue of a base peptide selected from the plurality of basepeptides, and create a third peptide set by selecting a subset of thesecond peptide set, wherein the selected subset of the second peptideset has a predicted vaccine performance, wherein the predicted vaccineperformance has a population coverage above a predetermined threshold,and wherein the subset comprises at least one peptide of the secondpeptide set.

In some embodiments, the plurality of base peptides of the first peptideset is derived from a target protein, wherein the target protein is atumor neoantigen or a pathogen proteome. In some embodiments, selectingthe plurality of base peptides to create the first peptide set comprisessliding a window of size n across an amino acid sequence encoding thetarget protein, wherein n is between about 8 amino acids and about 25amino acids in length, and wherein n is a length of each peptide of theplurality of base peptides of the first peptide set. In someembodiments, a peptide of the plurality of base peptides binds to an HLAclass I molecule or an HLA class II molecule. In some embodiments, thesubstitution of the at least one residue comprises substituting an aminoacid at an anchor residue position for a different amino acid at theanchor residue position. In some embodiments, the system furthercomprises filtering the first peptide set to exclude a peptide with apredicted binding core that contains a target residue in an anchorposition. In some embodiments, the second peptide set comprises thefirst peptide set. In some embodiments, the prediction to be bound bythe one or more HLA alleles is computed using a binding affinity lessthan about 1000 nM. In some embodiments, the predicted vaccineperformance is determined by computing a plurality of peptide-HLAimmunogenicities of the third peptide set to at least one HLA allele. Insome embodiments, each peptide-HLA immunogenicity of the plurality ofpeptide-HLA immunogenicities of the third peptide set is based on apredicted binding affinity of less than about 500 nM. In someembodiments, the predicted vaccine performance is based on a populationcoverage, wherein the population coverage is computed based on afrequency of an HLA haplotype in a human population. In someembodiments, the predicted vaccine performance is based on a populationcoverage, wherein the population coverage is computed based on afrequency of at least two HLA alleles in a human population. In someembodiments, the plurality of base peptides is present in a singlesubject. In some embodiments, the predicted vaccine performance is anexpected number of peptide-HLA hits. In some embodiments, the disease iscancer, and wherein the cancer is selected from the group consisting ofpancreas, colon, rectum, kidney, bronchus, lung, uterus, cervix,bladder, liver, and stomach. In some embodiments, the plurality of basepeptides of the first peptide set comprises at least one self-peptide.

In another aspect, the invention provides for a non-transitorycomputer-readable storage medium comprising computer-readableinstructions for determining an immunogenic peptide composition that,when executed by a processor cause the processor to create a firstpeptide set by selecting a plurality of base peptides, wherein a firstbase peptide and a second base peptide of the plurality of base peptidesare each scored for binding by two or more HLA alleles, wherein thefirst base peptide and the second base peptide are each predicted to bebound by one or more HLA alleles, and wherein the first base peptide andthe second base peptide are associated with a disease, create a secondpeptide set comprising the first base peptide, the second base peptide,a first modified peptide, and a second modified peptide, wherein thefirst modified peptide comprises a substitution of at least one residueof the first base peptide, and wherein the second modified peptidecomprises a substitution of at least one residue of the second basepeptide, and create a third peptide set by selecting a subset of thesecond peptide set, wherein the selected subset of the second peptideset has a predicted vaccine performance, and wherein the predictedvaccine performance is a function of a peptide-HLA immunogenicity of atleast one peptide of the third peptide set with respect to the two ormore HLA alleles.

In some embodiments, the plurality of base peptides of the first peptideset is derived from a target protein, wherein the target protein is atumor neoantigen or a pathogen proteome. I n some embodiments, selectingthe plurality of base peptides to create the first peptide set comprisessliding a window of size n across an amino acid sequence encoding thetarget protein, wherein n is between about 8 amino acids and about 25amino acids in length, and wherein n is a length of each peptide of theplurality of base peptides of the first peptide set. In someembodiments, a peptide of the plurality of base peptides binds to an HLAclass I molecule or an HLA class II molecule. In some embodiments, thesubstitution of the at least one residue comprises substituting an aminoacid at an anchor residue position for a different amino acid at theanchor residue position. In some embodiments, the non-transitorycomputer-readable storage medium of further comprises filtering thefirst peptide set to exclude a peptide with a predicted binding corethat contains a target residue in an anchor position. In someembodiments, the second peptide set comprises the first peptide set. Insome embodiments, the prediction to be bound by the two or more HLAalleles is computed using a binding affinity less than about 1000 nM. Insome embodiments, the plurality of base peptides of the first peptideset comprises at least one self-peptide.

In another aspect, the invention provides for a system for selecting animmunogenic peptide composition comprising a processor, and a memorystoring processor-executable instructions that, when executed by theprocessor, cause the processor to create a first peptide set byselecting a plurality of base peptides, wherein a first base peptide ofthe plurality of base peptides is scored for binding by three or moreHLA alleles, wherein the first base peptide is predicted to be bound byone or more HLA alleles, and wherein the first base peptide isassociated with a disease, create a second peptide set comprising thefirst base peptide and a modified peptide, wherein the modified peptidecomprises a substitution of at least one residue of the first basepeptide, and create a third peptide set by selecting a subset of thesecond peptide set, wherein the selected subset of the second peptideset has a predicted vaccine performance, and wherein the predictedvaccine performance is a function of a peptide-HLA immunogenicity of atleast one peptide of the third peptide set with respect to the three ormore HLA alleles.

In some embodiments, the first base peptide is scored for binding basedon data obtained from experimental assays. In some embodiments, thepredicted vaccine performance includes a peptide-HLA immunogenicity ofthe modified peptide bound to the first HLA allele of the one or moreHLA alleles if the first base peptide is predicted to be bound to thefirst HLA allele of the one or more HLA alleles with a first bindingcore, wherein the first binding core is a binding core of the first basepeptide, wherein the first binding core is identical to a second bindingcore, and wherein the second binding core is a binding core of themodified peptide bound to the first HLA allele.

In another aspect, the invention provides for a non-transitorycomputer-readable storage medium comprising computer-readableinstructions for determining an immunogenic peptide composition that,when executed by a processor cause the processor to create a firstpeptide set by selecting a plurality of base peptides, wherein at leastone peptide of the plurality of base peptides is associated with adisease, create a second peptide set comprising a first base peptideselected from the first base peptide set and a modified peptide, whereinthe modified peptide comprises a substitution of at least one residue ofthe first base peptide, and create a third peptide set by selecting asubset of the second peptide set, wherein the selected subset of thesecond peptide set has a predicted vaccine performance, wherein thepredicted vaccine performance has an expected number of peptide-HLA hitsabove a predetermined threshold, and wherein the subset comprises atleast one peptide of the second peptide set.

In some embodiments, the first base peptide binds to an HLA class Imolecule or an HLA class II molecule.

In another aspect, the invention provides for a system for selecting animmunogenic peptide composition comprising a processor, and a memorystoring processor-executable instructions that, when executed by theprocessor, cause the processor to create a first peptide set byselecting a first plurality of peptides, wherein the first plurality ofpeptides comprises a plurality of target peptides that are associatedwith a first disease, and wherein the first peptide set has a firstpredicted vaccine performance value, create a second peptide set byselecting a second plurality of peptides, wherein the second pluralityof peptides comprises a plurality of target peptides that are associatedwith a second disease, and wherein the second peptide set has a secondpredicted vaccine performance value, create a first weighted peptide setby multiplying a first weight by the first predicted vaccine performancevalue, create a second weighted peptide set multiplying a second weightby the second predicted vaccine performance value, and create a thirdpeptide set by combining the first weighted peptide set and the secondweighted peptide set.

In some embodiments, the first predicted vaccine performance value andthe second predicted vaccine performance value are computed based on apopulation coverage of a vaccine. In some embodiments, the firstpredicted vaccine performance value and the second predicted vaccineperformance value are computed based on an expected number ofpeptide-HLA hits. In some embodiments, the first plurality of peptidesis derived from a tumor neoantigen or a pathogen proteome. In someembodiments, the second plurality of peptides is derived from a tumorneoantigen or a pathogen proteome. In some embodiments, the firstdisease is cancer, and wherein the cancer is selected from the groupconsisting of pancreas, colon, rectum, kidney, bronchus, lung, uterus,cervix, bladder, liver, and stomach. In some embodiments, the seconddisease is cancer, and wherein the cancer is selected from the groupconsisting of pancreas, colon, rectum, kidney, bronchus, lung, uterus,cervix, bladder, liver, and stomach. In some embodiments, the firstplurality of peptides comprises a peptide that binds to an HLA class Imolecule or an HLA class II molecule. In some embodiments, the secondplurality of peptides comprises a peptide that binds to an HLA class Imolecule or an HLA class II molecule.

Compositions

In some embodiments, a peptide vaccine comprises one or more peptides ofthis disclosure and is administered in a pharmaceutical composition thatincludes a pharmaceutically acceptable carrier. In some embodiments, thepeptide vaccine is comprised of the third peptide set, as described inthis disclosure. In some embodiments, the pharmaceutical composition isin the form of a spray, aerosol, gel, solution, emulsion, lipidnanoparticle, nanoparticle, or suspension.

The composition is preferably administered to a subject with apharmaceutically acceptable carrier. Typically, in some embodiments, anappropriate amount of a pharmaceutically acceptable salt is used in theformulation, which in some embodiments can render the formulationisotonic.

In certain embodiments, the peptides are provided as an immunogeniccomposition comprising any one of the peptides described herein and apharmaceutically acceptable carrier. In certain embodiments, theimmunogenic composition further comprises an adjuvant. In certainembodiments, the peptides are conjugated with other molecules toincrease their effectiveness as is known by those practiced in the art.For example, peptides can be coupled to antibodies that recognize cellsurface proteins on antigen presenting cells to enhance vaccineeffectiveness. One such method for increasing the effectiveness ofpeptide delivery is described in Woodham, et al. (2018). In certainembodiments for the treatment of autoimmune disorders, the peptides aredelivered with a composition and protocol designed to induce toleranceas is known in the art. Example methods for using peptides for immunetolerization are described in Alhadj Ali, et al. (2017) and Gibson, etal. (2015).

In some embodiments, the pharmaceutically acceptable carrier is selectedfrom the group consisting of saline, Ringer's solution, dextrosesolution, and a combination thereof. Other suitable pharmaceuticallyacceptable carriers known in the art are contemplated. Suitable carriersand their formulations are described in Remington's PharmaceuticalSciences, 2005, Mack Publishing Co. The pH of the solution is preferablyfrom about 5 to about 8, and more preferably from about 7 to about 7.5.The formulation may also comprise a lyophilized powder. Further carriersinclude sustained release preparations such as semipermeable matrices ofsolid hydrophobic polymers, which matrices are in the form of shapedarticles, e.g., films, liposomes or microparticles. It will be apparentto those persons skilled in the art that certain carriers may be morepreferable depending upon, for instance, the route of administration andconcentration of peptides being administered.

The phrase pharmaceutically acceptable carrier as used herein means apharmaceutically acceptable material, composition or vehicle, such as aliquid or solid filler, diluent, excipient, solvent or encapsulatingmaterial, involved in carrying or transporting the subjectpharmaceutical agent from one organ, or portion of the body, to anotherorgan, or portion of the body. Each carrier is acceptable in the senseof being compatible with the other ingredients of the formulation andnot injurious to the patient. Some examples of materials which can serveas pharmaceutically acceptable carriers include: sugars, such aslactose, glucose and sucrose; starches, such as corn starch and potatostarch; cellulose, and its derivatives, such as sodium carboxymethylcellulose, ethyl cellulose and cellulose acetate; powdered tragacanth;malt; gelatin; talc; excipients, such as cocoa butter and suppositorywaxes; oils, such as peanut oil, cottonseed oil, safflower oil, sesameoil, olive oil, corn oil and soybean oil; glycols, such as butyleneglycol; polyols, such as glycerin, sorbitol, mannitol and polyethyleneglycol; esters, such as ethyl oleate and ethyl laurate; agar; bufferingagents, such as magnesium hydroxide and aluminum hydroxide; alginicacid; pyrogen-free water; isotonic saline; Ringer's solution; ethylalcohol; phosphate buffer solutions; and other non-toxic compatiblesubstances employed in pharmaceutical formulations. The term carrierdenotes an organic or inorganic ingredient, natural or synthetic, withwhich the active ingredient is combined to facilitate the application.The components of the pharmaceutical compositions also are capable ofbeing comingled with the compounds of the present invention, and witheach other, in a manner such that there is no interaction which wouldsubstantially impair the desired pharmaceutical efficiency. Thecomposition may also include additional agents such as an isotonicityagent, a preservative, a surfactant, and, a divalent cation, preferably,zinc.

The composition can also include an excipient, or an agent forstabilization of a peptide composition, such as a buffer, a reducingagent, a bulk protein, amino acids (such as e.g., glycine or praline) ora carbohydrate. Bulk proteins useful in formulating peptide compositionsinclude albumin. Typical carbohydrates useful in formulating peptidesinclude but are not limited to sucrose, mannitol, lactose, trehalose, orglucose.

Surfactants may also be used to prevent soluble and insolubleaggregation and/or precipitation of peptides or proteins included in thecomposition. Suitable surfactants include but are not limited tosorbitan trioleate, soya lecithin, and oleic acid. In certain cases,solution aerosols are preferred using solvents such as ethanol. Thus,formulations including peptides can also include a surfactant that canreduce or prevent surface-induced aggregation of peptides by atomizationof the solution in forming an aerosol. Various conventional surfactantscan be employed, such as polyoxyethylene fatty acid esters and alcohols,and polyoxyethylene sorbitol fatty acid esters. Amounts will generallyrange between 0.001% and 4% by weight of the formulation. In someembodiments, surfactants used with the present disclosure arepolyoxyethylene sorbitan mono-oleate, polysorbate 80, polysorbate 20.Additional agents known in the art can also be included in thecomposition.

In some embodiments, the pharmaceutical compositions and dosage formsfurther comprise one or more compounds that reduce the rate by which anactive ingredient will decay, or the composition will change incharacter. So called stabilizers or preservatives may include, but arenot limited to, amino acids, antioxidants, pH buffers, or salt buffers.Nonlimiting examples of antioxidants include butylated hydroxy anisole(BHA), ascorbic acid and derivatives thereof, tocopherol and derivativesthereof, butylated hydroxy anisole and cysteine. Nonlimiting examples ofpreservatives include parabens, such as methyl or propylp-hydroxybenzoate and benzalkonium chloride. Additional nonlimitingexamples of amino acids include glycine or proline.

The present invention also teaches the stabilization (preventing orminimizing thermally or mechanically induced soluble or insolubleaggregation and/or precipitation of an inhibitor protein) of liquidsolutions containing peptides at neutral pH or less than neutral pH bythe use of amino acids including proline or glycine, with or withoutdivalent cations resulting in clear or nearly clear solutions that arestable at room temperature or preferred for pharmaceuticaladministration.

In one embodiment, the composition is a pharmaceutical composition ofsingle unit or multiple unit dosage forms. Pharmaceutical compositionsof single unit or multiple unit dosage forms of the invention comprise aprophylactically or therapeutically effective amount of one or morecompositions (e.g., a compound of the invention, or other prophylacticor therapeutic agent), typically, one or more vehicles, carriers, orexcipients, stabilizing agents, and/or preservatives. Preferably, thevehicles, carriers, excipients, stabilizing agents and preservatives arepharmaceutically acceptable.

In some embodiments, the pharmaceutical compositions and dosage formscomprise anhydrous pharmaceutical compositions and dosage forms.Anhydrous pharmaceutical compositions and dosage forms of the inventioncan be prepared using anhydrous or low moisture containing ingredientsand low moisture or low humidity conditions. Pharmaceutical compositionsand dosage forms that comprise lactose and at least one activeingredient that comprise a primary or secondary amine are preferablyanhydrous if substantial contact with moisture and/or humidity duringmanufacturing, packaging, and/or storage is expected. An anhydrouspharmaceutical composition should be prepared and stored such that itsanhydrous nature is maintained. Accordingly, anhydrous compositions arepreferably packaged using materials known to prevent exposure to watersuch that they can be included in suitable formulary kits. Examples ofsuitable packaging include, but are not limited to, hermetically sealedfoils, plastics, unit dose containers (e.g., vials), blister packs, andstrip packs.

Suitable vehicles are well known to those skilled in the art ofpharmacy, and non-limiting examples of suitable vehicles includeglucose, sucrose, starch, lactose, gelatin, rice, silica gel, glycerol,talc, sodium chloride, dried skim milk, propylene glycol, water, sodiumstearate, ethanol, and similar substances well known in the art. Salinesolutions and aqueous dextrose and glycerol solutions can also beemployed as liquid vehicles. Whether a particular vehicle is suitablefor incorporation into a pharmaceutical composition or dosage formdepends on a variety of factors well known in the art including, but notlimited to, the way in which the dosage form will be administered to apatient and the specific active ingredients in the dosage form.Pharmaceutical vehicles can be sterile liquids, such as water and oils,including those of petroleum, animal, vegetable or synthetic origin,such as peanut oil, soybean oil, mineral oil, sesame oil and the like.

The invention also provides that a pharmaceutical composition can bepackaged in a hermetically sealed container such as an ampoule orsachette indicating the quantity. In one embodiment, the pharmaceuticalcomposition can be supplied as a dry sterilized lyophilized powder in adelivery device suitable for administration to the lower airways of apatient. The pharmaceutical compositions can, if desired, be presentedin a pack or dispenser device that can contain one or more unit dosageforms containing the active ingredient. The pack can for examplecomprise metal or plastic foil, such as a blister pack. The pack ordispenser device can be accompanied by instructions for administration.

Methods of preparing these formulations or compositions include the stepof bringing into association a compound of the present invention withthe carrier and, optionally, one or more accessory ingredients. Ingeneral, the formulations are prepared by uniformly and intimatelybringing into association a compound of the present invention withliquid carriers, or finely divided solid carriers, or both, and then, ifnecessary, shaping the product.

Formulations of the invention suitable for administration may be in theform of powders, granules, or as a solution or a suspension in anaqueous or non-aqueous liquid, or as an oil-in-water or water-in-oilliquid emulsion, or as an elixir or syrup, or as pastilles (using aninert base, such as gelatin and glycerin, or sucrose and acacia) and/oras mouthwashes and the like, each containing a predetermined amount of acompound of the present invention (e.g., peptides) as an activeingredient.

A liquid composition herein can be used as such with a delivery device,or they can be used for the preparation of pharmaceutically acceptableformulations comprising peptides that are prepared for example by themethod of spray drying. The methods of spray freeze-dryingpeptides/proteins for pharmaceutical administration disclosed in Maa etal., Curr. Pharm. Biotechnol., 2001, 1, 283-302, are incorporatedherein. In another embodiment, the liquid solutions herein are freezespray dried and the spray-dried product is collected as a dispersiblepeptide-containing powder that is therapeutically effective whenadministered to an individual.

The compounds and pharmaceutical compositions of the present inventioncan be employed in combination therapies, that is, the compounds andpharmaceutical compositions can be administered concurrently with, priorto, or subsequent to, one or more other desired therapeutics or medicalprocedures (e.g., peptide vaccine can be used in combination therapywith another treatment such as chemotherapy, radiation, pharmaceuticalagents, and/or another treatment). The particular combination oftherapies (therapeutics or procedures) to employ in a combinationregimen will take into account compatibility of the desired therapeuticsand/or procedures and the desired therapeutic effect to be achieved. Itwill also be appreciated that the therapies employed may achieve adesired effect for the same disorder (for example, the compound of thepresent invention may be administered concurrently with anothertherapeutic or prophylactic).

The invention also provides a pharmaceutical pack or kit comprising oneor more containers filled with one or more of the ingredients of thepharmaceutical compositions of the invention. Optionally associated withsuch container(s) can be a notice in the form prescribed by agovernmental agency regulating the manufacture, use or sale ofpharmaceuticals or biological products, which notice reflects approvalby the agency of manufacture, use or sale for human administration.

The current invention provides for dosage forms comprising peptidessuitable for treating cancer or other diseases. The dosage forms can beformulated, e.g., as sprays, aerosols, nanoparticles, liposomes, orother forms known to one of skill in the art. See, e.g., Remington'sPharmaceutical Sciences; Remington: The Science and Practice of Pharmacysupra; Pharmaceutical Dosage Forms and Drug Delivery Systems by HowardC., Ansel et al., Lippincott Williams & Wilkins; 7th edition (Oct. 1,1999).

Generally, a dosage form used in the acute treatment of a disease maycontain larger amounts of one or more of the active ingredients itcomprises than a dosage form used in the chronic treatment of the samedisease. In addition, the prophylactically and therapeutically effectivedosage form may vary among different conditions. For example, atherapeutically effective dosage form may contain peptides that has anappropriate immunogenic action when intending to treat cancer or otherdisease. On the other hand, a different effective dosage may containpeptides that has an appropriate immunogenic action when intending touse the peptides of the invention as a prophylactic (e.g., vaccine)against cancer or another disease/condition. These and other ways inwhich specific dosage forms encompassed by this invention will vary fromone another and will be readily apparent to those skilled in the art.See, e.g., Remington's Pharmaceutical Sciences, 2005, Mack PublishingCo.; Remington: The Science and Practice of Pharmacy by Gennaro,Lippincott Williams & Wilkins; 20th edition (2003); PharmaceuticalDosage Forms and Drug Delivery Systems by Howard C. Ansel et al.,Lippincott Williams & Wilkins; 7th edition (Oct. 1, 1999); andEncyclopedia of Pharmaceutical Technology, edited by Swarbrick, J. & J.C. Boylan, Marcel Dekker, Inc., New York, 1988, which are incorporatedherein by reference in their entirety.

The pH of a pharmaceutical composition or dosage form may also beadjusted to improve delivery and/or stability of one or more activeingredients. Similarly, the polarity of a solvent carrier, its ionicstrength, or tonicity can be adjusted to improve delivery. Compoundssuch as stearates can also be added to pharmaceutical compositions ordosage forms to alter advantageously the hydrophilicity or lipophilicityof one or more active ingredients to improve delivery. In this regard,stearates can also serve as a lipid vehicle for the formulation, as anemulsifying agent or surfactant, and as a delivery enhancing orpenetration-enhancing agent. Different salts, hydrates, or solvates ofthe active ingredients can be used to adjust further the properties ofthe resulting composition.

Compositions can be formulated with appropriate carriers and adjuvantsusing techniques to yield compositions suitable for immunization. Thecompositions can include an adjuvant, such as, for example but notlimited to, alum, poly IC, MF-59, squalene-based adjuvants, or liposomalbased adjuvants suitable for immunization.

In some embodiments, the compositions and methods comprise any suitableagent or immune modulation which could modulate mechanisms of hostimmune tolerance and release of the induced antibodies. In certainembodiments, an immunomodulatory agent is administered in at time and inan amount sufficient for transient modulation of the subject's immuneresponse so as to induce an immune response which comprises antibodiesagainst for example tumor neoantigens (i.e., tumor-specific antigens(TSA)).

Expression Systems

In certain aspects, the invention provides culturing a cell line thatexpresses any one of the peptides of the invention in a culture mediumcomprising any of the peptides described herein.

Various expression systems for producing recombinant proteins/peptidesare known in the art, and include, prokaryotic (e.g., bacteria), plant,insect, yeast, and mammalian expression systems. Suitable cell lines,can be transformed, transduced, or transfected with nucleic acidscontaining coding sequences for the peptides of the invention in orderto produce the molecule of interest. Expression vectors containing sucha nucleic acid sequence, which can be linked to at least one regulatorysequence in a manner that allows expression of the nucleotide sequencein a host cell, can be introduced via methods known in the art.Practitioners in the art understand that designing an expression vectorcan depend on factors, such as the choice of host cell to be transfectedand/or the type and/or amount of desired protein to be expressed.Enhancer regions, which are those sequences found upstream or downstreamof the promoter region in non-coding DNA regions, are also known in theart to be important in optimizing expression. If needed, origins ofreplication from viral sources can be employed, such as if a prokaryotichost is utilized for introduction of plasmid DNA. However, in eukaryoticorganisms, chromosome integration is a common mechanism for DNAreplication. For stable transfection of mammalian cells, a smallfraction of cells can integrate introduced DNA into their genomes. Theexpression vector and transfection method utilized can be factors thatcontribute to a successful integration event. For stable amplificationand expression of a desired protein, a vector containing DNA encoding aprotein of interest is stably integrated into the genome of eukaryoticcells (for example mammalian cells), resulting in the stable expressionof transfected genes. A gene that encodes a selectable marker (forexample, resistance to antibiotics or drugs) can be introduced into hostcells along with the gene of interest in order to identify and selectclones that stably express a gene encoding a protein of interest. Cellscontaining the gene of interest can be identified by drug selectionwherein cells that have incorporated the selectable marker gene willsurvive in the presence of the drug. Cells that have not incorporatedthe gene for the selectable marker die. Surviving cells can then bescreened for the production of the desired protein molecule.

A host cell strain, which modulates the expression of the insertedsequences, or modifies and processes the nucleic acid in a specificfashion desired also may be chosen. Such modifications (for example,glycosylation and other post-translational modifications) and processing(for example, cleavage) of peptide/protein products may be important forthe function of the peptide/protein. Different host cell strains havecharacteristic and specific mechanisms for the post-translationalprocessing and modification of proteins and gene products. As such,appropriate host systems or cell lines can be chosen to ensure thecorrect modification and processing of the target protein expressed.Thus, eukaryotic host cells possessing the cellular machinery for properprocessing of the primary transcript, glycosylation, and phosphorylationof the gene product may be used.

Various culturing parameters can be used with respect to the host cellbeing cultured. Appropriate culture conditions for mammalian cells arewell known in the art (Cleveland W L, et al., J Immunol Methods, 1983,56(2): 221-234) or can be determined by the skilled artisan (see, forexample, Animal Cell Culture: A Practical Approach 2nd Ed., Rickwood, D.and Hames, B. D., eds. (Oxford University Press: New York, 1992)). Cellculturing conditions can vary according to the type of host cellselected. Commercially available medium can be utilized.

Peptides of the invention can be purified from any human or non-humancell which expresses the polypeptide, including those which have beentransfected with expression constructs that express peptides of theinvention. For protein recovery, isolation and/or purification, the cellculture medium or cell lysate is centrifuged to remove particulate cellsand cell debris. The desired polypeptide molecule is isolated orpurified away from contaminating soluble proteins and polypeptides bysuitable purification techniques. Non-limiting purification methods forproteins include: size exclusion chromatography; affinitychromatography; ion exchange chromatography; ethanol precipitation;reverse phase HPLC; chromatography on a resin, such as silica, or cationexchange resin, e.g., DEAE; chromatofocusing; SDS-PAGE; ammonium sulfateprecipitation; gel filtration using, e.g., Sephadex G-75, Sepharose;protein A sepharose chromatography for removal of immunoglobulincontaminants; and the like. Other additives, such as protease inhibitors(e.g., PMSF or proteinase K) can be used to inhibit proteolyticdegradation during purification. Purification procedures that can selectfor carbohydrates can also be used, e.g., ion-exchange soft gelchromatography, or HPLC using cation- or anionexchange resins, in whichthe more acidic fraction(s) is/are collected.

Methods of Treatment

In one embodiment, the subject matter disclosed herein relates to apreventive medical treatment started after following diagnosis of cancerin order to prevent the disease from worsening or curing the disease. Inone embodiment, the subject matter disclosed herein relates toprophylaxis of subjects who are believed to be at risk for cancer orhave previously been diagnosed with cancer (or another disease). In oneembodiment, said subjects can be administered the peptide vaccinedescribed herein or pharmaceutical compositions thereof. The inventioncontemplates using any of the peptides produced by the systems andmethods described herein. In one embodiment, the peptide vaccinesdescribed herein can be administered subcutaneously via syringe or anyother suitable method know in the art.

The compound(s) or combination of compounds disclosed herein, orpharmaceutical compositions may be administered to a cell, mammal, orhuman by any suitable means. Non-limiting examples of methods ofadministration include, among others, (a) administration though oralpathways, which includes administration in capsule, tablet, granule,spray, syrup, or other such forms; (b) administration through non-oralpathways such as intraocular, intranasal, intraauricular, rectal,vaginal, intraurethral, transmucosal, buccal, or transdermal, whichincludes administration as an aqueous suspension, an oily preparation orthe like or as a drip, spray, suppository, salve, ointment or the like;(c) administration via injection, including subcutaneously,intraperitoneally, intravenously, intramuscularly, intradermally,intraorbitally, intracapsularly, intraspinally, intrasternally, or thelike, including infusion pump delivery; (d) administration locally suchas by injection directly in the renal or cardiac area, e.g., by depotimplantation; (e) administration topically; as deemed appropriate bythose of skill in the art for bringing the compound or combination ofcompounds disclosed herein into contact with living tissue; (f)administration via inhalation, including through aerosolized, nebulized,and powdered formulations; and (g) administration through implantation.

As will be readily apparent to one skilled in the art, the effective invivo dose to be administered and the particular mode of administrationwill vary depending upon the age, weight and species treated, and thespecific use for which the compound or combination of compoundsdisclosed herein are employed. The determination of effective doselevels, that is the dose levels necessary to achieve the desired result,can be accomplished by one skilled in the art using routinepharmacological methods. Typically, human clinical applications ofproducts are commenced at lower dose levels, with dose level beingincreased until the desired effect is achieved. Alternatively,acceptable in vitro studies can be used to establish useful doses androutes of administration of the compositions identified by the presentmethods using established pharmacological methods. Effective animaldoses from in vivo studies can be converted to appropriate human dosesusing conversion methods known in the art (e.g., see Nair A B, Jacob S.A simple practice guide for dose conversion between animals and human.Journal of basic and clinical pharmacy. 2016 March; 7(2):27.)

Methods of Prevention

In some embodiments, the peptides prepared using methods of theinvention can be used as a vaccine to promote an immune response againstcancer (e.g., against tumor neoantigens). In some embodiments, theinvention provides compositions and methods for induction of immuneresponse, for example induction of antibodies to tumor neoantigens. Insome embodiments, the antibodies are broadly neutralizing antibodies. Insome embodiments, the peptides prepared using methods of the inventioncan be used as a vaccine to promote an immune response against apathogen. In some embodiments, the peptides prepared using methods ofthe invention can be used to promote immune tolerance as an autoimmunedisease therapeutic.

The compositions, systems, and methods disclosed herein are not to belimited in scope to the specific embodiments described herein. Indeed,various modifications of the compositions, systems, and methods inaddition to those described will become apparent to those of skill inthe art from the foregoing description.

What is claimed is:
 1. A method of forming an immunogenic peptidecomposition, the method comprising: using a processor to perform thesteps of: determining a plurality of peptide-HLA binding scores for afirst peptide sequence, wherein the first peptide sequence is associatedwith a tumor neoantigen, a pathogen proteome, or a self-protein;determining whether the first peptide sequence has a peptide-HLA bindingscore that passes a threshold with respect to at least one HLA allele;creating a first peptide set comprising at least two modified peptidesequences that each comprise a substitution of at least one amino acidresidue of the first peptide sequence; determining a plurality ofpeptide-HLA binding scores for each peptide sequence in the firstpeptide set; and creating a second peptide set by selecting a subset ofthe first peptide set, wherein the selecting comprises excluding apeptide-HLA binding score with respect to a first HLA allele for amodified peptide sequence of the at least two modified peptide sequencesif a peptide-HLA binding score for the first peptide sequence does notpass the threshold with respect to the first HLA allele; performing anexperimental assay to obtain a peptide-HLA immunogenicity metric for atleast one peptide sequence of the second peptide set; and forming animmunogenic peptide composition comprising the at least one peptidesequence of the second peptide set for which the experimental assay wasperformed.
 2. The method of claim 1, wherein selecting the subset of thefirst peptide set further comprises computing a population coverage. 3.The method of claim 2, wherein the selected subset of the first peptideset has a population coverage that corresponds to a proportion of ahuman population of at least about 0.7.
 4. The method of claim 2,wherein the population coverage is computed with respect to at leastthree HLA alleles.
 5. The method of claim 2, wherein the populationcoverage is computed based on a frequency of an HLA allele in a humanpopulation.
 6. The method of claim 1, wherein selecting the subset ofthe first peptide set further comprises computing a predicted vaccineperformance.
 7. The method of claim 6, wherein computing the predictedvaccine performance is based on an HLA type of a subject.
 8. The methodof claim 1, wherein each peptide sequence of the first peptide set bindsto an HLA class I molecule or an HLA class II molecule.
 9. The method ofclaim 1, further comprising excluding from the second peptide set apeptide sequence with a predicted binding core that contains a targetamino acid residue in an anchor position.
 10. The method of claim 1,wherein the threshold is a binding affinity of less than about 1000 nM.11. The method of claim 1, wherein the immunogenic peptide compositioncomprises nucleic acid sequences encoding an amino acid sequence of theat least one peptide sequence of the second peptide set.
 12. A method offorming an immunogenic peptide composition, the method comprising: usinga processor to perform the steps of: determining a plurality ofpeptide-HLA immunogenicity metrics for a first peptide sequence, whereinthe first peptide sequence is associated with a tumor neoantigen, apathogen proteome, or a self-protein; determining whether the firstpeptide sequence has a peptide-HLA immunogenicity metric that passes athreshold with respect to at least one HLA allele; creating a firstpeptide set comprising at least two modified peptide sequences that eachcomprise a substitution of at least one amino acid residue of the firstpeptide sequence; determining a plurality of peptide-HLA immunogenicitymetrics for each peptide sequence in the first peptide set; and creatinga second peptide set by selecting a subset of the first peptide set,wherein the selecting comprises excluding a peptide-HLA immunogenicmetric with respect to a first HLA allele for a modified peptidesequence of the at least two modified peptide sequences if a peptide-HLAimmunogenicity metric for the first peptide sequence does not pass thethreshold with respect to the first HLA allele; performing anexperimental assay to obtain a peptide-HLA immunogenicity metric for atleast one peptide sequence of the second peptide set; and forming animmunogenic peptide composition comprising the at least one peptidesequence of the second peptide set for which the experimental assay wasperformed.
 13. The method of claim 12, wherein selecting the subset ofthe first peptide set further comprises computing a population coverage.14. The method of claim 13, wherein the selected subset of the firstpeptide set has a population coverage that corresponds to a proportionof a human population of at least about 0.7.
 15. The method of claim 13,wherein the population coverage is computed with respect to at leastthree HLA alleles.
 16. The method of claim 13, wherein the populationcoverage is computed based on a frequency of an HLA allele in a humanpopulation.
 17. The method of claim 12, wherein selecting the subset ofthe first peptide set further comprises computing a predicted vaccineperformance.
 18. The method of claim 17, wherein computing the predictedvaccine performance is based on an HLA type of a subject.
 19. The methodof claim 12, wherein each peptide sequence of the first peptide setbinds to an HLA class I molecule or an HLA class II molecule.
 20. Themethod of claim 12, further comprising excluding from the second peptideset a peptide sequence with a predicted binding core that contains atarget amino acid residue in an anchor position.
 21. The method of claim12, wherein the threshold is a binding affinity of less than about 1000nM.
 22. The method of claim 12, wherein the immunogenic peptidecomposition comprises nucleic acid sequences encoding an amino acidsequence of the at least one peptide sequence of the second peptide set.23. A method of forming an immunogenic peptide composition, the methodcomprising: using a processor to perform the steps of: determining aplurality of peptide-HLA binding scores for a first peptide sequence,wherein the first peptide sequence is associated with a tumorneoantigen, a pathogen proteome, or a self-protein; creating a firstpeptide set comprising at least three modified peptide sequences thateach comprise a substitution of at least one amino acid residue of thefirst peptide sequence; determining a plurality of peptide-HLA bindingscores for each peptide sequence in the first peptide set; determiningwhether each of the at least three modified peptide sequences has apeptide-HLA binding score that passes a threshold with respect to atleast one HLA allele; and creating a second peptide set by selecting asubset of the first peptide set, wherein the selecting comprisesexcluding a peptide-HLA binding score with respect to a first HLA allelefor a modified peptide sequence of the at least three modified peptidesequences if a peptide-HLA binding score for the modified peptidesequence does not pass the threshold with respect to the first HLAallele; performing an experimental assay to obtain a peptide-HLAimmunogenicity metric for at least two peptide sequences of the secondpeptide set; and forming an immunogenic peptide composition comprisingthe at least two peptide sequences of the second peptide set for whichthe experimental assay was performed.
 24. The method of claim 23,wherein selecting the subset of the first peptide set further comprisescomputing a population coverage, wherein the population coverage iscomputed with respect to at least three HLA alleles, and wherein thepopulation coverage is computed based on a frequency of an HLA haplotypein a human population.
 25. The method of claim 23, wherein selecting thesubset of the first peptide set further comprises computing a predictedvaccine performance.
 26. The method of claim 25, wherein computing thepredicted vaccine performance is based on an HLA type of a subject. 27.The method of claim 23, wherein each peptide sequence of the firstpeptide set binds to an HLA class I molecule or an HLA class IImolecule.
 28. The method of claim 23, further comprising excluding fromthe second peptide set a peptide sequence with a predicted binding corethat contains a target amino acid residue in an anchor position.
 29. Themethod of claim 23, wherein the threshold is a binding affinity of lessthan about 50 nM.
 30. The method of claim 23, wherein the immunogenicpeptide composition comprises nucleic acid sequences encoding an aminoacid sequence of the at least one peptide sequence of the second peptideset.